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Preface 


I will be happy with this preface if three important points come through clearly: 


1. The beauty and variety of linear algebra, and its extreme usefulness 
2. The goals of this book, and the new features in this Fourth Edition 


3. The steady support from our linear algebra websites and the video lectures 


May I begin with notes about two websites that are constantly used, and the new one. 


ocw.mit.edu Messages come from thousands of students and faculty about linear algebra 
on this OpenCourseWare site. The 18.06 course includes video lectures of a complete 
semester of classes. Those lectures offer an independent review of the whole subject based 
on this textbook—the professor’s time stays free and the student’s time can be 3 a.m. 
(The reader doesn’t have to be in a class at all.) A million viewers around the world have 
seen these videos (amazing). I hope you find them helpful. 


web.mit.edu/18.06 This site has homeworks and exams (with solutions) for the current 
course as it is taught, and as far back as 1996. There are also review questions, Java demos, 
Teaching Codes, and short essays (and the video lectures). My goal is to make this book 
as useful as possible, with all the course material we can provide. 


math. mit.edu/linearalgebra The newest website is devoted specifically to this Fourth Edi- 
tion. It will be a permanent record of ideas and codes and good problems and solutions. 
Several sections of the book are directly available online, plus notes on teaching linear 
algebra. The content is growing quickly and contributions are welcome from everyone. 


The Fourth Edition 


Thousands of readers know earlier editions of Introduction to Linear Algebra. The new 
cover shows the Four Fundamental Subspaces—the row space and nullspace are on 
the left side, the column space and the nullspace of AT are on the right. It is not usual 
to put the central ideas of the subject on display like this! You will meet those four spaces 
in Chapter 3, and you will understand why that picture is so central to linear algebra. 
Those were named the Four Fundamental Subspaces in my first book, and they start 
from a matrix A. Each row of A is a vector in n-dimensional space. When the matrix 


V 
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has m rows, each column is a vector in m-dimensional space. The crucial operation in 
linear algebra is taking linear combinations of vectors. (That idea starts on page 1 of the 
book and never stops.) When we take all linear combinations of the column vectors, we get 
the column space. If this space includes the vector b, we can solve the equation Ax = b. 


I have to stop here or you won’t read the book. May I call special attention to the new 
Section 1.3 in which these ideas come early—-with two specific examples. You are not 
expected to catch every detail of vector spaces in one day! But you will see the first matrices 
in the book, and a picture of their column spaces, and even an inverse matrix. You will be 
learning the language of linear algebra in the best and most efficient way: by using it. 


Every section of the basic course now ends with Challenge Problems. They follow a 
large collection of review problems, which ask you to use the ideas in that section—-the 
dimension of the column space, a basis for that space, the rank and inverse and determinant 
and eigenvalues of A. Many problems look for computations by hand on a small matrix, 
and they have been highly praised. The new Challenge Problems go a step further, and 
sometimes they go deeper. Let me give four examples: 


Section 2.1: Which row exchanges of a Sudoku matrix produce another Sudoku matrix? 


Section 2.4: From the shapes of A, B, C, is it faster to compute AB times C or A times BC? 


Background: The great fact about multiplying matrices is that AB times C gives the same 
answer as A times BC. This simple statement is the reason behind the rule for matrix 
multiplication. If AB is square and C is a vector, it’s faster to do BC first. Then multiply 
by A to produce ABC. The question asks about other shapes of A, B, and C. 


Section 3.4: If Ax = b and Cx = b have the same solutions for every b, is A = C? 


Section 4.1: What conditions on the four vectors r, n, c, £ allow them to be bases for 
the row space, the nullspace, the column space, and the left nullspace of a 2 by 2 matrix? 


The Start of the Course 


The equation Ax = b uses the language of linear combinations right away. The vector 
Ax is a combination of the columns of A. The equation is asking for a combination that 
produces b. The solution vector x comes at three levels and all are important: 


1. Direct solution to find x by forward elimination and back substitution. 
2. Matrix solution using the inverse of A: x = A~!b (if A has an inverse). 


3. Vector space solution x = y + z as shown on the cover of the book: 


Particular solution (to Ay = b) plus nullspace solution (to Az = 0) 
Direct elimination is the most frequently used algorithm in scientific computing, and the 
idea is not hard. Simplify the matrix A so it becomes triangular—then all solutions come 

quickly. I don’t spend forever on practicing elimination, it will get learned. 
The speed of every new supercomputer is tested on Ax = b: it’s pure linear algebra. 
IBM and Los Alamos announced a new world record of 10!> operations per second in 2008. 
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That petaflop speed was reached by solving many equations in parallel. High performance 
computers avoid operating on single numbers, they feed on whole submatrices. 

The processors in the Roadrunner are based on the Cell Engine in PlayStation 3. 
What can I say, video games are now the largest market for the fastest computations. 

Even a supercomputer doesn’t want the inverse matrix: too slow. Inverses give the sim- 
plest formula x = A~'b but not the top speed. And everyone must know that determinants 
are even slower—there is no way a linear algebra course should begin with formulas for 
the determinant of an n by n matrix. Those formulas have a place, but not first place. 


Structure of the Textbook 


Already in this preface, you can see the style of the book and its goal. That goal is serious, 
to explain this beautiful and useful part of mathematics. You will see how the applications 
of linear algebra reinforce the key ideas. I hope every teacher will learn something new; 
familiar ideas can be seen in a new way. The book moves gradually and steadily from 
numbers to vectors to subspaces—each level comes naturally and everyone can get it. 


Here are ten points about the organization of this book: 


1. Chapter 1 starts with vectors and dot products. If the class has met them before, 
focus quickly on linear combinations. The new Section 1.3 provides three indepen- 
dent vectors whose combinations fill all of 3-dimensional space, and three depen- 
dent vectors in a plane. Those two examples are the beginning of linear algebra. 


2. Chapter 2 shows the row picture and the column picture of Ax = b. The heart of 
linear algebra is in that connection between the rows of A and the columns: the 
same numbers but very different pictures. Then begins the algebra of matrices: an 
elimination matrix E multiplies A to produce a zero. The goal here is to capture 
the whole process—start with A and end with an upper triangular U. 


Elimination is seen in the beautiful form A = LU. The lower triangular L holds 
all the forward elimination steps, and U is the matrix for back substitution. 


3. Chapter 3 is linear algebra at the best level: subspaces. The column space contains 
all linear combinations of the columns. The crucial question is: How many of those 
columns are needed? The answer tells us the dimension of the column space, and 
the key information about A. We reach the Fundamental Theorem of Linear Algebra. 


4. Chapter 4 has m equations and only n unknowns. It is almost sure that Ax = b has 
no solution. We cannot throw out equations that are close but not perfectly exact. 
When we solve by least squares, the key will be the matrix ATA. This wonderful 
matrix ATA appears everywhere in applied mathematics, when A is rectangular. 


5. Determinants in Chapter 5 give formulas for all that has come before—inverses, 
pivots, volumes in n-dimensional space, and more. We don’t need those formulas to 
compute! They slow us down. But det A = 0 tells when a matrix is singular, and 
that test is the key to eigenvalues. 
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Section 6.1 introduces eigenvalues for 2 by 2 matrices. Many courses want to see 
eigenvalues early. It is completely reasonable to come here directly from Chapter 3, 
because the determinant is easy for a 2 by 2 matrix. The key equation is Ax = Ax. 


Eigenvalues and eigenvectors are an astonishing way to understand a square matrix. 
They are not for Ax = b, they are for dynamic equations like du/dt = Au. 
The idea is always the same: follow the eigenvectors. In those special directions, 
A acts like a single number (the eigenvalue À) and the problem is one-dimensional. 


Chapter 6 is full of applications. One highlight is diagonalizing a symmetric matrix. 
Another highlight—not so well known but more important every day—is the 
diagonalization of any matrix. This needs two sets of eigenvectors, not one, and 
they come (of course!) from ATA and AAT. This Singular Value Decomposition 
often marks the end of the basic course and the start of a second course. 


. Chapter 7 explains the linear transformation approach—tt is linear algebra without 


coordinates, the ideas without computations. Chapter 9 is the opposite—all about 
how Ax = b and Ax = Ax are really solved. Then Chapter 10 moves from real 
numbers and vectors to complex vectors and matrices. The Fourier matrix F is the 
most important complex matrix we will ever see. And the Fast Fourier Transform 
(multiplying quickly by F and F~') is a revolutionary algorithm. 


. Chapter 8 is full of applications, more than any single course could need: 


8.1 Matrices in Engineering—differential equations replaced by matrix equations 
8.2 Graphs and Networks—leading to the edge-node matrix for Kirchhoff’s Laws 
8.3 Markov Matrices—as in Google’s PageRank algorithm 

8.4 Linear Programming—a new requirement x > 0 and minimization of the cost 
8.5 Fourier Series—linear algebra for functions and digital signal processing 

8.6 Matrices in Statistics and Probability—Ax = b is weighted by average errors 


8.7 Computer Graphics—matrices move and rotate and compress images. 


. Every section in the basic course ends with a Review of the Key Ideas. 


How should computing be included in a linear algebra course? It can open a new 
understanding of matrices—every class will find a balance. I chose the language of 
MATLAB as a direct way to describe linear algebra: eig(ones(4)) will produce the 
eigenvalues 4, 0, 0, 0 of the 4 by 4 all-ones matrix. Go to netlib.org for codes. 


You can freely choose a different system. More and more software is open source. 


The new website math.mit.edu/linearalgebra provides further ideas about teaching and 
learning. Please contribute! Good problems are welcome by email: gs@math.mit.edu. 
Send new applications too, linear algebra is an incredibly useful subject. 
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The Variety of Linear Algebra 


Calculus is mostly about one special operation (the derivative) and its inverse (the integral). 
Of course I admit that calculus could be important .... But so many applications of math- 
ematics are discrete rather than continuous, digital rather than analog. The century of data 
has begun! You will find a light-hearted essay called “Too Much Calculus” on my website. 
The truth is that vectors and matrices have become the language to know. 

Part of that language is the wonderful variety of matrices. Let me give three examples: 


Symmetric matrix Orthogonal matrix Triangular matrix 
2 -1 0 90 1 1 I I 1 i 1 1 
-1 2 -1 0 i1|1 ~l 1 -l 0O 1 1 1 
0 -1 2 -l 2] 1 1 -1 -l 001 1 
0 0-1 2 1 ~-l -l 1 000 1 


A key goal is learning to “read” a matrix. You need to see the meaning in the numbers. 
This is really the essence of mathematics—patterns and their meaning. 

May I end with this thought for professors. You might feel that the direction is right, 
and wonder if your students are ready. Just give them a chance! Literally thousands of 
students have written to me, frequently with suggestions and surprisingly often with thanks. 
They know this course has a purpose, because the professor and the book are on their side. 
Linear algebra is a fantastic subject, enjoy it. 


Help With This Book 


I can’t even name all the friends who helped me, beyond thanking Brett Coonley at MIT 
and Valutone in Mumbai and SIAM in Philadelphia for years of constant and dedicated 
support. The greatest encouragement of all is the feeling that you are doing something 
worthwhile with your life. Hundreds of generous readers have sent ideas and examples and 
corrections (and favorite matrices!) that appear in this book. Thank you ail. 


Background of the Author 


This is my eighth textbook on linear algebra, and I have not written about myself before. 
I hesitate to do it now. It is the mathematics that is important, and the reader. The next 
paragraphs add something personal as a way to say that textbooks are written by people. 

I was born in Chicago and went to school in Washington and Cincinnati and St. Louis. 
My college was MIT (and my linear algebra course was extremely abstract). After that 
came Oxford and UCLA, then back to MIT for a very long time. I don’t know how many 
thousands of students have taken 18.06 (more than a million when you include the videos 
on ocw.mit.edu). The time for a fresh approach was right, because this fantastic subject 
was only revealed to math majors—we needed to open linear algebra to the world. 

Those years of teaching led to the Haimo Prize from the Mathematical Association of 
America. For encouraging education worldwide, the International Congress of Industrial 
and Applied Mathematics awarded me the first Su Buchin Prize. I am extremely grateful, 
more than I could possibly say. What I hope most is that you will like linear algebra. 


Chapter 1 


Introduction to Vectors 


The heart of linear algebra is in two operations—both with vectors. We add vectors to get 
v + w. We multiply them by numbers c and d to get cv and dw. Combining those two 
operations (adding cv to dw) gives the linear combination cv + dw. 


DY 2] feta] 


Example v+w=] |+| > l = | ; | is the combination with c = d = 1 


Linear combinations are all-important in this subject! Sometimes we want one partic- 
ular combination, the specific choice c = 2 and d = 1 that produces cv + dw = (4,5). 
Other times we want all the combinations of v and w (coming from all c and d). 

The vectors cv lie along a line. When w is not on that line, the combinations cv + d w 
fili the whole two-dimensional plane. (I have to say “two-dimensional” because linear 
algebra allows higher-dimensional planes.) Starting from four vectors u, v, w,z in four- 
dimensional space, their combinations cu + dv + ew + fz are likely to fill the space— 
but not always. The vectors‘and their combinations could even lie on one line. 

Chapter 1 explains these central ideas, on which everything builds. We start with two- 
dimensional vectors and three-dimensional vectors, which are reasonable to draw. Then 
we move into higher dimensions. The really impressive feature of linear algebra is how 
smoothly it takes that step into n-dimensional space. Your mental picture stays completely 
correct, even if drawing a ten-dimensional vector is impossible. 

This is where the book is going (into n-dimensional space). The first steps are the 
operations in Sections 1.1 and 1.2. Then Section 1.3 outlines three fundamental ideas. 


1.1 Vector addition v + w and linear combinations cv + dw. 
1.2 The dot product v + w of two vectors and the length \|v|| = /v-v. 


1.3 Matrices A, linear equations Ax = b, solutions x = A~'b. 
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1.1 Vectors and Linear Combinations 


“You can’t add apples and oranges.” In a strange way, this is the reason for vectors. 
We have two separate numbers vı and v2. That pair produces a two-dimensional vector v: 


v vı = first component 
Column vector v= | l | 1 P 


v2 = second component 


We write v as a column, not as a row. The main point so far is to have a single letter v 
(in boldface italic) for this pair of numbers v, and v2 (in lightface italic). 


Even if we don’t add v; to v2, we do add vectors. The first components of v and w stay 
separate from the second components: 


VECTOR _ | vy _| wi _ | vt wy 
ADDITION ~” v> | and w=] wo | add to v+w=] Vz + We |: 


You see the reason. We want to add apples to apples. Subtraction of vectors follows the 
same idea: The components of v — w are vı — wi and vz — Wo. 


The other basic operation is scalar multiplication. Vectors can be multiplied by 2 or by 
—1 or by any number c. There are two ways to double a vector. One way is to add v + v. 
The other way (the usual way) is to multiply each component by 2: 


SCALAR wa] and a[o 
MULTIPLICATION =| w, n v=] a [i 


The components of cv are cv; and cv2. The number c is called a “scalar”. 


Notice that the sum of —v and v is the zero vector. This is 0, which is not the same as 
the number zero! The vector 0 has components 0 and 0. Forgive me for hammering away 
at the difference between a vector and its components. Linear algebra is built on these 
operations v + w and cvu—adding vectors and multiplying by scalars. 


The order of addition makes no difference: v + w equals w +v. Check that by algebra: 
The first component is vı + wı which equals w + v1. Check also by an example: 


erR] ete-[5}e[s]-[8], 
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Linear Combinations 


Combining addition with scalar multiplication, we now form “linear combinations” of ù 
and w. Multiply v by c and multiply w by d; then add cv + dw. 


DEFINITION The sum of cv and dwisa linear combination of vandw. © 


Four special linear combinations are: sum, difference, zero, and a scalar multiple cv: 


lv+ lw = sum of vectors in Figure 1.la 

luv — lw difference of vectors in Figure 1.1b 
Ov + Ow zero vector 

cv +0w = _ vector cv in the direction of v 


lI 


The zero vector is always a possible combination (its coefficients are zero). Every time we 
see a “space” of vectors, that zero vector will be included. This big view, taking all the 
combinations of v and w, is linear algebra at work. 

The figures show how you can visualize vectors. For algebra, we just need the com- 
ponents (like 4 and 2). That vector v is represented by an arrow. The arrow goes vı = 4 
units to the right and v2 = 2 units up. It ends at the point whose x, y coordinates are 4, 2. 
This point is another representation of the vector—so we have three ways to describe v: 


: Represent vector v © Two numbers ` Arrow from (0,0) -Pointin the plane Drita 


We add using the numbers. We visualize v + w using arrows: 
Vector addition (head to tail) At the end of v, place the start of w. 


Figure 1.1: Vector addition v + w = (3, 4) produces the diagonal of a parallelogram. 
The linear combination on the right is v — w = (5, 0). 


We travel along v and then along w. Or we take the diagonal shortcut along v + w. We 
could also go along w and then v. In other words, w + v gives the same answer as v + w. 
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These are different ways along the parallelogram (in this example it is a rectangle). The 
sum is the diagonal vector v + w. 

The zero vector 0 = (0,0) is too short to draw a decent arrow, but you know that 
v + 0 = v. For 2v we double the length of the arrow. We reverse w to get —w. This 
reversing gives the subtraction on the right side of Figure 1.1. 


Vectors in Three Dimensions 


A vector with two components corresponds to a point in the xy plane. The components of v 
are the coordinates of the point: x = vı and y = v2. The arrow ends at this point (v1, v2), 
when it starts from (0, 0). Now we allow vectors to have three components (v1, v2, v3). 

The xy plane is replaced by three-dimensional space. Here are typical vectors (still 
column vectors but with three components): 


1 2 3 
v= 1 and w=]|3 and v+w=]|4 
4 3 


The vector v corresponds to an arrow in 3-space. Usually the arrow starts at the “origin”, 
where the xyz axes meet and the coordinates are (0,0,0). The arrow ends at the point 
with coordinates v1, v2, v3. There is a perfect match between the column vector and the 
arrow from the origin and the point where the arrow ends. 


x 
Figure 1.2: Vectors [>] and | y | correspond to points (x, y) and (x, y, Z). 
Z 


Fromnowon v= | 1 


isalso writtenas v= (1,1,-1)- 
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The reason for the row form (in parentheses) is to save space. But v = (1, 1,—1) is 
not a row vector! It is in actuality a column vector, just temporarily lying down. The row 
vector [1 1 —1] is absolutely different, even though it has the same three components. 
That row vector is the “transpose” of the column v. 

In three dimensions, v + w is still found a component at a time. The sum has 
components vj + w1 and v2 + w2 and v3 + w3. You see how to add vectors in 4 or 5 
or n dimensions. When w starts at the end of v, the third side is v + w. The other way 
around the parallelogram is w + v. Question: Do the four sides all lie in the same plane? 
Yes. And the sum v + w — v — w goes completely around to produce the vector. 

A typical linear combination of three vectors in three dimensions is u + 4w — 2w: 


Linear combination l 1 2 1 
Multiply by 1, 4, —2 O);+412;-2}] 3|=12 
Then add 3 1 —1 9 


The Important Questions 


For one vector u, the only linear combinations are the multiples cu. For two vectors, 
the combinations are cu + dv. For three vectors, the combinations are cu + dv + ew. 
Will you take the big step from one combination to all combinations? Every c and d and e 
are allowed. Suppose the vectors u, v, w are in three-dimensional space: 


1. What is the picture of all combinations cu? 
2. What is the picture of all combinations cu + dv? 
3. What is the picture of all combinations cu + dv + ew? 


The answers depend on the particular vectors u, v, and w. If they were zero vectors (a very 
extreme case), then every combination would be zero. If they are typical nonzero vectors 
(components chosen at random), here are the three answers. This is the key to our subject: 


1. The combinations cu fill a line. 
2. The combinations cu +d v fill a plane. 
3. The combinations cu + dv + ew fill three-dimensional space. 


The zero vector (0, 0,0) is on the line because c can be zero. It is on the plane because c 
and d can be zero. The line of vectors cu is infinitely long (forward and backward). It is the 
plane of all cu + dv (combining two vectors in three-dimensional space) that I especially 
ask you to think about. 


Adding all cu on one line to all dv on the other line fills in the plane in Figure 1.3. 
When we include a third vector w, the multiples ew give a third line. Suppose that third 


line is not in the plane of u and v. Then combining all ew with all cu + dv fills up the 
whole three-dimensional space. 
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Plane from 
all cu + dv 


Line containing all cz 


(b) 


Figure 1.3: (a) Line through u. (b) The plane containing the lines through u and v. 


This is the typical situation! Line, then plane, then space. But other possibilities exist. 
When w happens to be cu + dv, the third vector is in the plane of the first two. The 
combinations of u, v, w will not go outside that uv plane. We do not get the full three- 
dimensional space. Please think about the special cases in Problem 1. 


= REVIEW OF THE KEY IDEAS =u 


. A vector v in two-dimensional space has two components v, and v2. 
v+w = (vi + w1, v2 +w2) and cv = (cv), cv2) are found a component at a time. 


. A linear combination of three vectors u and v and w is cu + dv + ew. 


e V N m 


. Take all linear combinations of u, or u and v, or u,v, w. In three dimensions, 
those combinations typically fill a line, then a plane, and the whole space RÈ. 


a WORKED EXAMPLES m" 
1.1 A The linear combinations of v = (1,1,0) and w = (0, 1, 1) fill a plane. Describe 
that plane. Find a vector that is not a combination of v and w. 


Solution The combinations cv +d w fill a plane in R3. The vectors in that plane allow 
any c and d. The plane of Figure 1.3 fills in between the “u-line” and the “v-line”’. 


l 0 c 
Combinations cv+dw=c| 1 |+d| 1 |=| c+d | fillaplane. 
0 l d 


Four particular vectors in that plane are (0,0,0) and (2,3,1) and (5,7,2) and 
(x, 22,2). The second component c + d is always the sum of the first and third com- 
ponents. The vector (1,2, 3) is not in the plane, because 2 Æ 1 + 3. 
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Another description of this plane through (0,0,0) is to know that n = (1,—1, 1) is 
perpendicular to the plane. Section 1.2 will confirm that 90° angle by testing dot products: 
v.n =Q0andw -n =Q. 


1.1 B Forv = (1,0) and w = (0, 1), describe all points cv with (1) whole numbers c 
(2) nonnegative c > 0. Then add all vectors dw and describe all cv + dw. 


Solution 


(1) The vectors cv = (c,0) with whole numbers c are equally spaced points along the 
x axis (the direction of v). They include (—2, 0), (—1, 0), (0, 0), (1, 0}, (2, 0). 


(2) The vectors cv with c > 0 fill a half-line. It is the positive x axis. This half-line 
starts at (0,0) where c = 0. It includes (z, 0) but not (—z, 0). 


(1') Adding all vectors dw = (0, d) puts a vertical line through those points cv. We 
have infinitely many parallel lines from (whole number c, any number d). 


(2') Adding all vectors dw puts a vertical line through every cv on the half-line. Now 
we have a half-plane. It is the right half of the xy plane (any x > 0, any height y). 


1.1 C Find two equations for the unknowns c and d so that the linear combination 
cv + dw equals the vector b: 


2 —1 1 
e=[a] [a] sio] 
Solution In applying mathematics, many problems have two parts: 
1 Modeling part Express the problem by a set of equations. 


2 Computational part Solve those equations by a fast and accurate algorithm. 


Here we are only asked for the first part (the equations). Chapter 2 is devoted to the second 
part (the algorithm). Our example fits into a fundamental model for linear algebra: 


Find Clee- en so that cV tees + CaUn =D. 


Forn = 2 we could find a formula for the c’s. The “elimination method” in Chapter 2 
succeeds far beyond n = 100. For n greater than | million, see Chapter 9. Here n = 2: 


2 -l1)_ 7 1 
Vector equation e| ileal alela] 


The required equations for c and d just come from the two components separately: 


. 2c-d=1 
Two scalar equations —c+2d =0 
, . . 2 l 
You could think of those as two lines that cross at the solution c = 3 d= 3" 
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Problem Set 1.1 


Problems 1-9 are about addition of vectors and linear combinations. 


1 Describe geometrically (line, plane, or all of R°) all linear combinations of 
1 3 1 0 2 0 2 
(a) | 2 | and | 6 (b) 0 | and | 2 (c) O | and | 2 | and | 2 
3 9 0 3 0 2 3 


2 Draw v = | f | and w = | -5 | and v+ w and v—w in a single xy plane. 
5 1 
3 Ifv +w = 1 and v — w = 5 , compute and draw v and w. 


4 From v = | h | and w = | ; |, find the components of 3v + w and cv + dw. 


5 Compute u + v + w and 2u + 2v + w. How do you know u, v, w lie in a plane? 


1 —3 2 
In a plane u=|2|, v=] 1], w= ]-3 
3 —2 —l 


6 Every combination of v = (1, —2, 1) and w = (0, 1,—1) has components that add 
to . Find c and d so that cv + dw = (3,3, —6). 


7 In the xy plane mark all nine of these linear combinations: 


SHEIN with c=0,1,2 and d =0,1,2. 


8 The paralleiogram in Figure 1.1 has diagonal v + w. What is its other diagonal? 
What is the sum of the two diagonals? Draw that vector sum. 


9 If three corners of a parallelogram are (1, 1), (4,2), and (1, 3), what are all three of 
the possible fourth corners? Draw two of them. 


Problems 10-14 are about special vectors on cubes and clocks in Figure 1.4. 


10 Which point of the cube is i + j? Which point is the vector sum of i = (1,0,0) 
and j = (0,1,0) and k = (0,0, 1)? Describe all points (x, y, z) in the cube. 


11 Four corners of the cube are (0, 0, 0), (1,0, 0), (0, 1,0), (0,0, 1). What are the other 
four comers? Find the coordinates of the center point of the cube. The center points 
of the six faces are 


12 How many corners does a cube have in 4 dimensions? How many 3D faces? 
How many edges? A typical corner is (0, 0, 1,0). A typical edge goes to (0, 1, 0, 0). 
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k = (00,1) j+k 


2:00 

J = (0, 1,0) 

Notice the illusion 
Is (0, 0, 0) a top or 

--- # 
i = (1,0,0) a bottom corner? 
Figure 1.4: Unit cube from i, j, k and twelve clock vectors. 
13 (a) What is the sum V of the twelve vectors that go from the center of a clock to 


the hours 1:00, 2:00, ..., 12:00? 
(b) If the 2:00 vector is removed, why do the 11 remaining vectors add to 8:00? 
(c) What are the components of that 2:00 vector v = (cos 9, sin 0)? 


14 Suppose the twelve vectors start from 6:00 at the bottom instead of (0,0) at the 
center. The vector to 12:00 is doubled to (0, 2). Add the new twelve vectors. 


Problems 15-19 go further with linear combinations of v and w (Figure 1.5a). 
15 Figure 1.5a shows 4v + 4w. Mark the points $v + jw and $v + tw and v + w. 


16 Mark the point —v + 2w and any other combination cv + dw withe +d = 1. 
Draw the line of all combinations that have c + d = 1. 


17 Locate v + $w and $v + 3w. The combinations cv + cw fill out what line? 
18 Restricted by 0 < c < l andO < d < 1, shade in all combinations cv + dw. 


19 Restricted only by c => 0 and d > 0 draw the “cone” of all combinations cv + dw. 


(a) 


Figure 1.5: Problems 15-19 in a plane Problems 20-25 in 3-dimensional space 
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Problems 20-25 deal with u, v, w in three-dimensional space (see Figure 1.5b). 


20 Locate tu + iv + iw and lu + iw in Figure 1.5b. Challenge problem: Under 
what restrictions on c, d,e, will the combinations cu + dv + ew fill in the dashed 
triangle? To stay in the triangle, one requirement is c > 0, d > 0,e > 0. 


21 The three sides of the dashed triangle are v — u and w — v and u — w. Their sum is 
. Draw the head-to-tail addition around a plane triangle of (3, 1) plus (—1, 1) 
plus (—2, —2). 


22 Shade in the pyramid of combinations cu + dv + ew with c > 0,d > 0,e = Oand 
c +d +e <1. Mark the vector (u + v + w) as inside or outside this pyramid. 


23 Ifyou look at ail combinations of those u, v, and w, is there any vector that can’t be 
produced from cu + dv + ew? Different answer if u, v, w are all in 


24 Which vectors are combinations of u and v, and also combinations of v and w? 
25 Draw vectors u, v, w so that their combinations cu + dv + ew fill only a line. 
Find vectors u, v, w so that their combinations cu + dv + ew fill only a plane. 


26 What combination c B +d H produces Hi Express this question as two 


8 
equations for the coefficients c and d in the linear combination. 


27 Review Question. In xyz space, where is the plane of all linear combinations of 
i = (1,0,0) and į + j = (1,1,0)? 


Challenge Problems 


28 Find vectors v and w so that v + w = (4,5,6) and v — w = (2,5,8). This is a 
question with unknown numbers, and an equal number of equations to find 
those numbers. 


29 Find two different combinations of the three vectors u = (1,3) and v = (2,7) and 
w = (1,5) that produce b = (0,1). Slightly delicate question: If I take any three 
vectors u, v, w in the plane, will there always be two different combinations that 
produce b = (0, 1)? 


30 The linear combinations of v = (a,b) and w = (c,d) fill the plane unless 
Find four vectors u, v, w, z with four components each so that their combinations 
cu + dv + ew + fz produce all vectors (b1, b2, b3, b4) in four-dimensional space. 


31 Write down three equations for c, d, e so that cu + du +ew = b. Can you somehow 
find c,d, and e? 


2 —1 0 l 
u= | —l v= 2 w= | -l b=] 0 
0 —] 2 0 
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1.2 Lengths and Dot Products 


The first section backed off from multiplying vectors. Now we go forward to define the 
“dot product” of v and w. This multiplication involves the separate products vw; and 
V2W2, but it doesn’t stop there. Those two numbers are added to produce the single number 
v+w. This is the geometry section (lengths and angles). 


DEFINITION The dot product or inner product ofv = = (i, v2) and we = (wy, w2) 
is the number v w: o5 o a 
v- w= = viw + vzw. : . . a (1) 


Example 1 The vectors v = (4,2) and w = (—1, 2) have a zero dot product: 


Dot product is zero 4 1] 7 
Perpendicular vectors H ° | >| =-4+4=0. 


In mathematics, zero is always a special number. For dot products, it means that these 
two vectors are perpendicular. The angle between them is 90°. When we drew them 
in Figure 1.1, we saw a rectangle (not just any parallelogram). The clearest example of 
perpendicular vectors is i = (1,0) along the x axis and j = (0,1) up the y axis. Again 
the dot product isi - j = 0 + 0 = 0. Those vectors and j form a right angle. 

The dot product of v = (1,2) and w = (3,1) is 5. Soon v - w will reveal the angle 
between v and w (not 90°). Please check that w - v is also 5. 


The dot product w - v equals v - w. The order of v and w makes no difference. 


Example 2. Puta weight of 4 at the point x = —1 (left of zero) and a weight of 2 at the 
point x = 2 (right of zero). The x axis will balance on the center point (like a see-saw). 
The weights balance because the dot product is (4)(—1) + (2)(Q) = 0. 

This example is typical of engineering and science. The vector of weights is (w1, w2) = 
(4,2). The vector of distances from the center is (v1, v2) = (—1, 2). The weights times the 
distances, w,v, and w2v2, give the “moments”. The equation for the see-saw to balance is 
W1V, + Won = 0. 


Example 3 Dot products enter in economics and business. We have three goods to buy 
and sell. Their prices are (pi, p2, p3) for each unit—this is the “price vector” p. The 
quantities we buy or sell are (q1, ¢2,¢3)—positive when we sell, negative when we buy. 
Selling qı units at the price pı brings in qı pı. The total income (quantities q times prices 
p) is the dot product q + p in three dimensions: 


Income = (41,492,493) * (P1, P2, P3) = 1 Pı + 42P2 + q3 p3 = dot product. 


A zero dot product means that “the books balance”. Total sales equal total purchases if 
q+ p = 0. Then p is perpendicular to q (in three-dimensional space). A supermarket with 
thousands of goods goes quickly into high dimensions. 
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Small note: Spreadsheets have become essential in management. They compute linear 
combinations and dot products. What you see on the screen is a matrix. 


Main point To compute v - w, multiply each v; times w;. Then add Zu; w;. 


Lengths and Unit Vectors 


An important case is the dot product of a vector with itself. In this case v equals w. 
When the vector is v = (1,2, 3), the dot product with itself is v - v = ||v||? = 14: 


1 


1 
Dot product v -v lol =12|-|2|=1+4+9= 14. 
3 3 


Length squared 


Instead of a 90° angle between vectors we have 0°. The answer is not zero because v is not 
perpendicular to itself. The dot product v - v gives the length of v squared. 


8 dength = Jol = Joe. o 


In two dimensions the length is Vv? + v3. In three dimensions it is Vu? + v3 + v3. 
By the calculation above, the length of v = (1, 2, 3) is ||v|| = v14. 

Here ||v|| = vv - v is just the ordinary length of the arrow that represents the vector. 
In two dimensions, the arrow is in a plane. If the components are 1 and 2, the arrow is 
the third side of a right triangle (Figure 1.6). The Pythagoras formula a? + b? = c?, 
which connects the three sides, is 1? + 27 = ||v||*. 

For the length of v = (1,2, 3), we used the right triangle formula twice. The vector 
(1, 2, 0) in the base has length v5. This base vector is perpendicular to (0, 0, 3) that goes 
straight up. So the diagonal of the box has length ||v|| = /5 +9 = v14. 

The length of a four-dimensional vector would be Vv? + v3 + v2 + vz. Thus the 
vector (1, 1, 1, 1) has length V1? + 1? + 1? + 1? = 2. This is the diagonal through a unit 
cube in four-dimensional space. The diagonal in n dimensions has length ./n. 

The word “unit” is always indicating that some measurement equals “one”. The unit 
price is the price for one item. A unit cube has sides of length one. A unit circle is a circle 
with radius one. Now we define the idea of a “unit vector”. 


| DEFINITION ` i A unit vector wis a vector-whose length equals one. Then w+ u= 1. 


An example in four dimensions is u = ($,5,3,5). Thenu-uist+$+4+4=1. 
We divided v = (1,1, 1, 1) by its length ||v|| = 2 to get this unit vector. 
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ions Pian (1, 2, 3) has 
vv = ve +03 +03 | length /14 
5 = 12422 : 
14 = 12422432 | ! 
! =! (0,2, 0) 
| (1, 2, 0) has 
(1,0,0) ©------- length /5 


Figure 1.6: The length v/v - v of two-dimensional and three-dimensional vectors. 


Example 4 The standard unit vectors along the x and y axes are written i and j. In the 
xy plane, the unit vector that makes an angle “theta” with the x axis is (cos 0, sin @): 


. . 1 . {0 _ {[ cosé 
Unit vectors i=] and j=(1] and =|] 


When 0 = 0, the horizontal vector u is i. When @ = 90° (or % radians), the vertical 
vector is j. At any angle, the components cos @ and sin produce u -u = 1 because 
cos? 0 + sin? @ = 1. These vectors reach out to the unit circle in Figure 1.7. Thus cos 0 
and sin ô are simply the coordinates of that point at angle 0 on the unit circle. 

Since (2,2, 1) has length 3, the vector (4, 2, 4) has length 1. Check thatu-u = 
t + é + ł = |. For a unit vector, divide any nonzero v by its length ||v||. 


Unit vector =. u=/|lo| -isa unit vector in the same directionasv. = o7 


J=OD v(1,1) 


v 


a= (Js: 75) Toy 
i= (1,0) 


—i 


Figure 1.7: The coordinate vectors i and j. The unit vector u at angle 45° (left) divides 
v = (1, 1) by its length ||v|| = 2. The unit vector u = (cos 9, sin 6) is at angle 8. 
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The Angle Between Two Vectors 


We stated that perpendicular vectors have v - w = 0. The dot product is zero when the 
angle is 90°. To explain this, we have to connect angles to dot products. Then we show 
how v > w finds the angle between any two nonzero vectors v and w. 


Right angles 7 os The dot product is v-w = 0 when v is perpendicular tow. 


Proof When v and w are perpendicular, they form two sides of a right triangle. 
The third side is v — w (the hypotenuse going across in Figure 1.8). The Pythagoras Law 
for the sides of a right triangle is a? + b? = c?: 


Perpendicular vectors — ||v||? + lwl? = Jv — w|]? (2) 
Writing out the formulas for those lengths in two dimensions, this equation is 
Pythagoras (v? + v3) + (w? + w3) = (v1 — w1)? + (v2 — w2). (3) 


The right side begins with v? — 2v,w, + w?. Then v? and w? are on both sides of the 
equation and they cancel, leaving —2v,;w . Also v2 and w2 cancel, leaving —2v2w2. 
(In three dimensions there would be —2v3 w3.) Now divide by —2: 


0 = —2v,;w, —2v2W2 which leads to vywy + vzw2 =Q. (4) 


Conclusion Right angles produce v - w = 0. The dot product is zero when the angle is 
0 = 90°. Then cos = 0. The zero vector v = 0 is perpendicular to every vector w 
because 0 - w is always zero. 


Now suppose v- w is not zero. It may be positive, it may be negative. The sign of v - w 
immediately tells whether we are below or above a right angle. The angle is less than 90° 
when v + w is positive. The angle is above 90° when v + w is negative. The right side of 
Figure 1.8 shows a typical vector v = (3,1). The angle with w = (1,3) is less than 90° 
because v : w = 6 is positive. 


Á 
— _ i vew>d) 
eH 


2 v 
V5 /20 viw <0 
< — — — 
v-w=0 
5420 = 25 angle above 90° angle below 90° 


in this half-plane in this half-plane 


Figure 1.8: Perpendicular vectors have v + w = 0. Then ||v||? + ||w||? = lv — w]]?. 
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The borderline is where vectors are perpendicular to v. On that dividing line between 
plus and minus, (1, —3) is perpendicular to (3, 1). The dot product is zero. 

The dot product reveals the exact angle 0. This is not necessary for linear algebra—you 
could stop here! Once we have matrices, we won’t come back to 6. But while we are on 
the subject of angles, this is the place for the formula. 

Start with unit vectors u and U. The sign of w+ U tells whether 0 < 90° or 0 > 90°. 
Because the vectors have length 1, we learn more than that. The dot product u » U is the 
cosine of 0. This is true in any number of dimensions. 


Unit vectors v and U at angle @ have u- U =cos@. Certainly |u-U| <1. 


Remember that cos 0 is never greater than 1. It is never less than —1. The dot product of 
unit vectors is between —1 and 1. 


Figure 1.9 shows this clearly when the vectors are u = (cos@,sin@) andi = (1,0). 
The dot product is u -i = cos @. That is the cosine of the angle between them. 

After rotation through any angle a, these are still unit vectors. The vector i = (1,0) 
rotates to (cosa, sina). The vector w rotates to (cos 8, sin f) with 8B = a + @. Their 
dot product is cos œ cos 8 + sina sin 6. From trigonometry this is the same as cos(f — @). 
But B — a is the angle 6, so the dot product is cos 8. 

cos f 
| sin B | 


[s 0 | 

u = . 0 

sm cosa 
sing 


Figure 1.9: The dot product of unit vectors is the cosine of the angle 8. 


Problem 24 proves |u-U| < 1 directly, without mentioning angles. The inequality and 
the cosine formula u + U = cos 0 are always true for unit vectors. 


What if v and w are not unit vectors? Divide by their lengths to get u = v/||v|| and 
U = w/||w||. Then the dot product of those unit vectors u and U gives cos 8. 


COSINE FORMULA Te we and w are nonzero vectors then To eel I = cos 8. 
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Whatever the angle, this dot product of v/|]v|| with w/||w|| never exceeds one. That 
is the “Schwarz inequality” |v w| < ||v|| ||w|| for dot products—or more correctly the 
Cauchy-Schwarz-Buniakowsky inequality. It was found in France and Germany 
and Russia (and maybe elsewhere—it is the most important inequality in mathematics). 

Since | cos 0| never exceeds 1, the cosine formula gives two great inequalities: 


SCHWARZ INEQUALITY |v -w| < vil Jw] 


TRIANGLE INEQUALITY |v + wil < loll + lwl 


Example 5 Find cos 0 for v = | ‘ | and w = | k | and check both inequalities. 
Solution The dot product is v - w = 4. Both v and w have length /5. The cosine is 4/5. 


vew 4 _4 
lol wll v55 5 
The angle is below 90° because v+ w = 4 is positive. By the Schwarz inequality, v ew = 4 


is less than ||v]| ||w|| = 5. Side 3 = ||v + wll is less than side 1 + side 2, by the triangle 
inequality. For v + w = (3, 3) that says v18 < /5 + /5. Square this to get 18 < 20. 


cos 6 = 


Example 6 The dot product of v = (a,b) and w = (b,a) is 2ab. Both lengths are 
Va? + b?, The Schwarz inequality in this case says that 2ab < a? + b?. 

This is more famous if we write x = a? and y = b*. The “geometric mean” |/xy 
is not larger than the “arithmetic mean” = average A(x + y). 


< 
mean mean ab < 5 becomes ./xy < 5 


Geometric < Arithmetic < a? +b? x+y 


Example 5 had a = 2 and b = 1. Sox = 4and y = 1. The geometric mean ,/xy = 2 
is below the arithmetic mean $(1 + 4) = 2.5. 


Notes on Computing 


Write the components of v as v(1),..., v(N) and similarly for w. In FORTRAN, the sum 
v + w requires a loop to add components separately. The dot product also uses a loop to 
add the separate v(j)w(j). Here are VPLUSW and VDOTW: 


DO 10 J = 1,N DO 10J=1,N 


FORTRAN 0 VPLUSW(J) = v(J) + w(J) 10 VDOTW = VDOTW + V(J) * W(J) 


MATLAB and also PYTHON work directly with whole vectors, not their components. 
No loop is needed. When v and w have been defined, v + w is immediately understood. 
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Input v and w as rows—the prime ’ transposes them to columns. 2v + 3w uses * for 

multiplication by 2 and 3. The result will be printed unless the line ends in a semicolon. 
MATLAB v=/[2 3 4)' ; w=f[l 1 1)’ ; w=2ev4+3%w 

The dot product v - w is usually seen as a row times a column (with no dot): 


3] 


Instead of | [; Tfal we more often see tl af al ot.. vl ew 


The length of v is known to MATLAB as norm (v). We could define it ourselves as 
sqrt (v’ * v), using the square root function—also known. The cosine we have to define 
ourselves! The angle (in radians) comes from the arc cosine (acos) function: 


Cosine formula) => < cosine = v’ * w/(norm(v) * norm (w)) 
Angle formula — ~ angle = acos (cosine) 


An M-file would create a new function cosine (v, w) for future use. The M-files created 
especially for this book are listed at the end. R and PYTHON are open source software. 


a REVIEW OF THE KEY IDEAS = 


. The dot product v - w multiplies each component v; by w; and adds all v; w;. 
. The length ||v|| of a vector is the square root of v - v. 
. u = V/|lv|| is a unit vector. Its length is 1. 


. The dot product is v - w = O when vectors v and w are perpendicular. 


a Aa U N m 


. The cosine of @ (the angle between any nonzero v and w) never exceeds 1: 
g= VY Schwarz inequality |v - w| < |l lwl 
= ———. rz inequali v- w| < |v . 

[>it wil 


Problem 21 will produce the triangle inequality ||v + w|| < ||v|| + lwli. 


= WORKED EXAMPLES 8 


1.2A For the vectors v = (3,4) and w = (4,3) test the Schwarz inequality on v - w 
and the triangle inequality on |v + w||. Find cos@ for the angle between v and w. 
When will we have equality |v + w| = ||v|| ||w|] and lv + wll = || vl] + lwll? 
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Solution The dot product is v-w = (3)(4) + (43) = 24. The length of v is 
Ile || = /9 F 16 = 5 and also ||w|| = 5. The sum v + w = (7, 7) has length 7/2 < 10. 


Schwarz inequality jv- w| < jiv] wi] is 24 < 25. 


Triangle inequality lv + wl] < loll + lw] is 7/2 <545. 


Cosine of angle cos = a Thin angle from v = (3, 4) to w = (4,3) 
Suppose one vector is a multiple of the other as in w = cv. Then the angle is 0° or 180°. 
In this case | cos 0| = 1 and |v + w| equals ||v|| ||w||. If the angle is 0°, as in w = 2v, then 


lv + wll = |lv|| + |w]. The triangle is completely flat. 


1.2B Find a unit vector u in the direction of v = (3,4). Find a unit vector U that is 
perpendicular to u. How many possibilities for U? 


Solution Fora unit vector u, divide v by its length ||v|| = 5. For a perpendicular vector 
V we can choose (—4, 3) since the dot product v » V is (3)(—4) + (4)(3) = 0. For a unit 
vector U, divide V by its length ||V ||: 


v 3 =) V (4.5) 
u = — = | -,- U = — = —=,z- u-U=0 
lvl (; 5 IVI 55 


The only other perpendicular unit vector would be —U = (2, —4). 


1.2C Finda vector x = (c,d) that has dot products x -r = 1 and x -s = 0 with the 
given vectors r = (2, —1) ands = (—1, 2). 
How is this question related to Example 1.1 C, which solved cv + dw = b = (1,0)? 


Solution Those two dot products give linear equations for c and d. Then x = (c,d). 


xer=1 ' 2e- d=1 The same equations as 
xes =0 —c+2d=0 in Worked Example 1.1 C 


The second equation makes x perpendicular to s = (—1,2). So I can see the geometry: 
Go in the perpendicular direction (2,1). When you reach x = 3(2, 1), the dot product 
with r = (2, —1) has the required value x -r = 1. 

Comment on n equations for x = (X1,...,Xn) in n-dimensional space 
Section 1.1 would start with column vectors v1,...,U,. The goal is to combine them to 
produce a required vector x101 + +--+ XnVn = b. This section would start from vectors 
Fi, .--,Fn- Now the goal is to find x with the required dot products x -r; = bi. 

Soon the v’s will be the columns of a matrix A, and the r’s will be the rows of A. 
Then the (one and only) problem will be to solve Ax = b. 
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Problem Set 1.2 


1 Calculate the dot products u-v andu-w and u » (v + w) and w » v: 


_ |-.6 _ 13 _ |8 
u=| g v=|; w=] ¢l- 
2 Compute the lengths ||u|| and ||v]| and ||w|| of those vectors. Check the Schwarz 


inequalities |u - v| < |u| [|u|] and |v - w| < ijol] lwll. 


3 Find unit vectors in the directions of v and w in Problem 1, and the cosine of the 
angle 0. Choose vectors a, b,c that make 0°, 90°, and 180° angles with w. 


4 For any unit vectors v and w, find the dot products (actual numbers) of 
(a) v and —v (b) v+ wand v- w (c) v—2wandv + 2w 


5 Find unit vectors w; and u2 in the directions of v = (3,1) and w = (2,1,2). 
Find unit vectors U, and U 2 that are perpendicular to u; and u2. 


6 (a) Describe every vector w = (w1, w2) that is perpendicular to v = (2, —1). 
(b) The vectors that are perpendicular to V = (1,1, 1) lie on a . 
(c) The vectors that are perpendicular to (1, 1, 1) and (1, 2, 3) lie ona . 


7 Find the angle 0 (from its cosine) between these pairs of vectors: 


1 i 2 2 
(a) v = Ra and w= fa W v=] 2| and w=]-1 
—] 2 


(c) v = ya and w= va (d) v= H and w= = 


8 True or false (give a reason if true or a counterexample if false): 
(a) If u is perpendicular (in three dimensions) to v and w, those vectors v and w 
are parallel. . 
(b) If u is perpendicular to v and w, then u is perpendicular to v + 2w. 
(c) If u and v are perpendicular unit vectors then ||u — v|| = /2. 
9 The slopes of the arrows from (0, 0) to (v1, v2) and (w1, w2) are v2/v; and w2/w1. 


Suppose the product v2w2/v,w , of those slopes is —1. Show that v - w = 0 and 
the vectors are perpendicular. 


10 Draw arrows from (0,0) to the points v = (1,2) and w = (—2, 1). Multiply their 
slopes. That answer is a signal that  - w = 0 and the arrows are 


11 Ifv-w is negative, what does this say about the angle between v and w? Draw a 
3-dimensional vector v (an arrow), and show where to find all w’s with v + w < 0. 
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With v = (1,1) and w = (1,5) choose a number c so that w — cv is perpendicular 
to v. Then find the formula that gives this number c for any nonzero v and w. 
(Note: cv is the “projection” of w onto v.) 


Find two vectors v and w that are perpendicular to (1, 0, 1) and to each other. 
Find nonzero vectors u, v, w that are perpendicular to (1, 1, 1, 1) and to each other. 


The geometric mean of x = 2 and y = 8 is ./xy = 4. The arithmetic mean is larger: 
+(x +y) = . This would come in Example 6 from the Schwarz inequality 
for v = (/2, V8) and w = (v8, V2). Find cos 8 for this v and w. 


How long is the vector v = (1,1,. .., 1) in 9 dimensions? Find a unit vector u in 
the same direction as v and a unit vector w that is perpendicular to v. 


What are the cosines of the angles œ, 8, 0 between the vector (1,0, —1) and the unit 
vectors i, j , k along the axes? Check the formula cos? œ + cos? p+ cos? 6 = 1. 


Problems 18-31 lead to the main facts about lengths and angles in triangles. 


18 


19 


20 


21 


22 


The parallelogram with sides v = (4,2) and w = (—1, 2) is a rectangle. Check the 
Pythagoras formula a? + b? = c? which is for right triangles only: 


(Rules for dot products) These equations are simple but useful: 

Qv-ew=w-v (Q)u-wt+w)=u-vtu-w GB)(cv)-w=c(v-w) 

Use (2) with u = v + w to prove |v + wl? =v-v + 2v:w + wew. 

The “Law of Cosines” comes from (v — w)-(v—w) =v-v—2vu-w+w-w: 
Cosine Law lv — wl? = |||? — 2||e|| lwll cos + || wil. 

If 6 < 90° show that ||v||* + || w||? is larger than ||v — wl? (the third side). 


The triangle inequality says: (length of v + w) < (length of v) + (length of w). 


Problem 19 found ||v + w|? = |v]? + 2v- w + ||w||?. Use the Schwarz inequality 
v» w < ||v]| wl] to show that ||side 3|| can not exceed ||side 1|| + ||side 2||: 


Triangle 


2 2 
inequality lv + wli < doll + lwl)" or 


The Schwarz inequality |v - w| < |{v|| ||w|| by algebra instead of trigonometry: 


(a) Multiply out both sides of (viwı + v2wW2)? < (v? + v2)(w? + w3). 


(b) Show that the difference between those two sides equals (vjw2 — v2w )?. 
This cannot be negative since it is a square—so the inequality is true. 
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v = (v1, v2) 


The figure shows that cosa = v,/|l|v|| and sina = v2/|lv||. Similarly cos £ is 
____andsinf is ___. The angle 0 is 8 — a. Substitute into the trigonometry 
formula cos f cosa + sin f sina for cos(B — æ) to find cos? = v » w/|lu|| Iwl]. 


One-line proof of the Schwarz inequality |u.» U| < 1 for unit vectors: 


2 U2 2 U2 l 1 
“itoi yty? nitin], 


2 2 2 
Put (u1, u2) = (.6, .8) and (U1, U2) = (.8, .6) in that whole line and find cos 8. 


ju -U| < jui] |U] + |u2||U2| S 


Why is | cos @| never greater than 1 in the first place? 


If v = (1, 2) draw all vectors w = (x, y) in the xy plane withyu-w =x+2y =5. 
Which is the shortest w? 


(Recommended) If ||v|| = 5 and ||w]| = 3, what are the smallest and largest values 
of ||v — w||? What are the smallest and largest values of v - w? 


Challenge Problems 


Can three vectors in the xy plane have u -v < Oanduv-w < Oandu-w < 0? 
I don’t know how many vectors in xyz space can have all negative dot products. 
(Four of those vectors in the plane would certainly be impossible .. .). 


Pick any numbers that add to x + y + z = 0. Find the angle between your vector 
v = (x,y,z) and the vector w = (z,x,y). Challenge question: Explain why 
v-w/||v|||| wll is always —}. 


How could you prove 2/xyz < +(x +y+z) (geometric mean < arithmetic mean )? 


Find four perpendicular unit vectors with all components equal to i or —4, 


Using v = randn(3, 1) in MATLAB, create a random unit vector u = v/||v||. Using 
V = randn(3, 30) create 30 more random unit vectors U;. What is the average size 
of the dot products |u - U; |? In calculus, the average fo [cos 6|d0/a = 2/x. 
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1.3 Matrices 


This section is based on two carefully chosen examples. They both start with three vectors. 
I will take their combinations using matrices. The three vectors in the first example are 
u,v, and w: 


l 0 
First example u=] —l v= 1 w=| 0 
0 —1 l 


Their linear combinations in three-dimensional space are cu + dv + ew: 


1 0 0 c 
Combinations c| -l | +d 1I | +e] 0 ļ]=]|d—-c |. (1) 
0 —1 I e-—d 


Now something important: Rewrite that combination using a matrix. The vectors u, v, w 
go into the columns of the matrix A. That matrix “multiplies” a vector: 


Same combination 1 0 0 c c 
. . —] ] 0 d = d — c . (2) 
is now Á times x 

0-1 1 e e-d 


The numbers c, d,e are the components of a vector x. The matrix A times the vector x 
is the same as the combination cu + dv + ew of the three columns: 


Matrix times vector Ax=]| u v w d | =cu+dv+ew. => B) 


This is more than a definition of Ax, because the rewriting brings a crucial change in 
viewpoint. At first, the numbers c,d,e were multiplying the vectors. Now the matrix 
is multiplying those numbers. The matrix A acts on the vector x. The result Ax is a 
combination b of the columns of A. 

To see that action, I will write x1,x2,x3 instead of c,d,e. I will write by, bo, b3 
for the components of Ax. With new letters we see 


1 0 0 x] xı by 
Ax = —İ 1 0 X2 = X2 — X1 = bz =b. (4) 
0 —li 1 X3 X3 — X2 b3 


The input is x and the output is b = Ax. This A is a “difference matrix” because b 
contains differences of the input vector x. The top difference is x1 — x9 = x1 — 0. 
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Here is an example to show differences of numbers (squares in x, odd numbers in b): 


1 1-0 l 
x = | 4 | = squares áAx=|4-1ļ]=]|3]| =b. (5) 
9 9—4 5 


That pattern would continue for a 4 by 4 difference matrix. The next square would be 
x4 = 16. The next difference would be x4 — x3 = 16 — 9 = 7 (this is the next odd 
number}. The matrix finds all the differences at once. 


Important Note. You may already have learned about multiplying Ax, a matrix times a 
vector. Probably it was explained differently, using the rows instead of the columns. The 
usual way takes the dot product of each row with x: 


Dot products 1 00 x1 (1,0,0) + (x1, X2, X3) 
with rows Ax =| —I 1 0 x2 | = | (-1,1,0)+- (x1, x2, x3) 
0-1 1 x3 (0, —1, 1) + (x1, x2, x3) 


Those dot products are the same x, and x2 — x, and x3 — x2 that we wrote in equation (4). 
The new way is to work with Ax a column at a time. Linear combinations are the key to 
linear algebra, and the output Ax is a linear combination of the columns of A. 

With numbers, you can multiply Ax either way (I admit to using rows). With letters, 
columns are the good way. Chapter 2 will repeat these rules of matrix multiplication, and 
explain the underlying ideas. There we wil! multiply matrices both ways. 


Linear Equations 


One more change in viewpoint is crucial. Up to now, the numbers x1, x2, x3 were known 
(called c, d,e at first). The right hand side 6 was not known. We found that vector of 
differences by multiplying Ax. Now we think of b as known and we look for x. 


Old question: Compute the linear combination xiu + x2v + x3w to find b. 
New question: Which combination of uz, v, w produces a particular vector b? 


This is the inverse problem—to find the input x that gives the desired output b = Ax. You 
have seen this before, as a system of linear equations for x1, X2, X3. The right hand sides 
of the equations are b1, b2, b3. We can solve that system to find x1, x2, x3: 


ce ey E 
Ax= Bb. -—xy+x2 = bo Solution x2 = bı + b2 cs (6) 
e — x2 + x3 = b3 x3 = bi + b2 + b3. 


Let me admit right away—most linear systems are not so easy to solve. In this example, 
the first equation decided x; = bı. Then the second equation produced x2 = bı + b2. 
The equations could be solved in order (top to bottom) because the matrix A was selected 
to be lower triangular. 
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Look at two specific choices 0, 0,0 and 1,3, 5 of the right sides b4, b2, b3: 


0 0 l 1 1 
b= | 0 | gives x = | 0 b= | 3 | gives x =| 1+3 =]4 
0 0 5 1+3+5 9 


The first solution (all zeros) is more important than it looks. In words: Jf the output is 
b = 0, then the input must be x = 0. That statement is true for this matrix A. It is not true 
for all matrices. Our second example will show (for a different matrix C) how we can have 
Cx = 0 when C #4 0 and x £0. 

This matrix A is “invertible”. From b we can recover x. 


The Inverse Matrix 


Let me repeat the solution x in equation (6). A sum matrix will appear! 


X1 bi 100 bi 
Ax = b is solved by | x2 | = | bı +b =] 1 1 0 ba |. D 
X3 bı + ba + b3 1 1 i1 b3 


If the differences of the x’s are the b’s, the sums of the b’s are the x’s. That was true for 
the odd numbers b = (1,3,5) and the squares x = (1,4,9). It is true for all vectors. 
The sum matrix S in equation (7) is the inverse of the difference matrix A. 

Example: The differences of x = (1,2,3) are b = (1,1,1). Sob = Ax and x = Sb: 


1 0 0 1 1 1 0 0 1 1 
Ax=|-1 1 0 2);=] 1 and Sb=] 1 1 O 1 |=| 2 
0-1 1 3 1 1 1 1 1 3 


Equation (7) for the solution vector x = (x1, X2, x3) tells us two important facts: 
1. For every b there is one solution to Ax = b. 2. A matrix S produces x = Sb. 


The next chapters ask about other equations Ax = b. Is there a solution? How is it 
computed? In linear algebra, the notation for the “inverse matrix” is A7?: 


Ax =b issolvedby x = A~'b=Sb. 


Note on caiculus. Let me connect these special matrices A and S to calculus. The vector 
x changes to a function x (t). The differences Ax become the derivative dx/dt = b(t). In 
the inverse direction, the sum $b becomes the integral of b(t). The Fundamental Theorem 
of Calculus says that integration S is the inverse of differentiation A. 


dx t 
Ax =b and x = Sb T = band x(t) = f b. (8) 
0 
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The derivative of distance traveled (x) is the velocity (b). The integral of b(t) is the 
distance x(t). Instead of adding +C, I measured the distance from x(0) = 0. In the 
same way, the differences started at x9 = 0. This zero start makes the pattern complete, 
when we write x; — Xo for the first component of Ax (we just wrote x1). 

Notice another analogy with calculus. The differences of squares 0,1,4,9 are odd 
numbers 1,3, 5. The derivative of x(t) = £? is 2t. A perfect analogy would have produced 
the even numbers b = 2,4,6 at times £ = 1,2,3. But differences are not the same 


as derivatives, and our matrix A produces not 2t but 24 — 1 (these one-sided “backward 
differences” are centered at t — $): 


x(t)— x(t — 1) = t? — (t — 1)? = £? — (t2 +1) = 2-1. (9) 


The Problem Set will follow up to show that “forward differences” produce 2t + 1. 
A better choice (not always seen in calculus courses) is a centered difference that uses 
x(t +1)— x(t — 1). Divide Ax by the distance At from ¢t — | to t + 1, which is 2: 


+D -0-1 _ 


Centered difference of x(t) = t? 5 


2t exactly. (10) 


Difference matrices are great. Centered is best. Our second example is not invertible. 


Cyclic Differences 


This example keeps the same columns u and v but changes w to a new vector w*: 


1 0 —1 
Second example u= j —1 v= | w* = 0 
0 —] 1 


Now the linear combinations of u, v, w* lead to a cyclic difference matrix C: 


a E 1 | 0 1 Xi Xi — X3 
o Cydie = .Cx=] -1 1 0 x2 |=| x22% |=b. QAD 


0 -=i l X3 X3 — X2 


This matrix C is not triangular. It is not so simple to solve for x when we are given b. 
Actually it is impossible to find the solution to Cx = b, because the three equations either 
have infinitely many solutions or else no solution: 


Cx =0 x1 —X3 0 xX] c 
Infinitely x2— xı | =| O | is solved by all vectors | x2 | =] c |. (12) 
many x X3 — X2 0 X3 
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Every constant vector (c,c,c) has zero differences when we go cyclically. This undeter- 
mined constant c is like the + C that we add to integrals. The cyclic differences have 
xı — x3 in the first component, instead of starting from x9 = 0. 

The other very likely possibility for Cx = b is no solution at all: 


X1 — X3 l . Left sides add to 0 


—Cx=b © | m—x, |=] 3 | ; Rightsidesaddto9 = (13) 


X3 =x 5 No solution x1, X2, X3 > 
3— X2 o 


Look at this example geometrically. No combination of u,v, and w* will produce the 
vector b = (1,3,5). The combinations don’t fill the whole three-dimensional space. 
The right sides must have bı + b2 + b3 = 0 to allow a solution to Cx = b, because 
the left sides xı — x3, X2 — X1, and x3 — x2 always add to zero. 

Put that in different words. All linear combinations xu + x.v + x3w* = b lie on 
the plane given by bı + b2 + b3 = 0. This subject is suddenly connecting algebra with 
geometry. Linear combinations can fill all of space, or only a plane. We need a picture to 
show the crucial difference between u, v, w (the first example) and u, v, w*. 


Figure 1.10: Independent vectors u, v, w. Dependent vectors u, v, w* in a plane. 


Independence and Dependence 


Figure 1.10 shows those column vectors, first of the matrix A and then of C. The first two 
columns u and v are the same in both pictures. If we only look at the combinations of those 
two vectors, we will get a two-dimensional plane. The key question is whether the third 
vector is in that plane: 


Independence w is not in the plane of u and v. 
Dependence w“ is in the plane of u and v. 


The important point is that the new vector w* is a linear combination of u and v: 


-1 
u+v+tw*=0 we = 0 | =—u-—v. (14) 
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All three vectors u,v, w* have components adding to zero. Then all their combinations 
will have bı + b2 + b3 = 0 (as we saw above, by adding the three equations). This is 
the equation for the plane containing all combinations of u and v. By including w* we get 
no new vectors because w* is already on that plane. 

The original w = (0,0, 1) is not on the plane: 0+ 0+ 1 # 0. The combinations of 
u,v, w fill the whole three-dimensional space. We know this already, because the solution 
x = Sb in equation (6) gave the right combination to produce any b. 

The two matrices A and C, with third columns w and w*, allowed me to mention two 
key words of linear algebra: independence and dependence. The first half of the course will 
develop these ideas much further—I am happy if you see them early in the two examples: 


u,v, w are independent. No combination except Ou + Ov + Ow = 0 gives b = 0. 
u,v, w* are dependent. Other combinations (specifically u + v + w*) give b = 0. 


You can picture this in three dimensions. The three vectors lie in a plane or they don’t. 
Chapter 2 has n vectors in n-dimensional space. Independence or dependence is the key 
point. The vectors go into the columns of an n by n matrix: 


Independent columns: Ax = 0 has one solution. A is an invertible matrix. 


Dependent columns: Ax = 0 has many solutions. A is a singular matrix. 


Eventually we will have n vectors in m-dimensional space. The matrix A with those n 
columns is now rectangular (m by n). Understanding Ax = b is the problem of Chapter 3. 


= REVIEW OF THE KEY IDEAS =" 


. Matrix times vector: Ax = combination of the columns of A. 
. The solution to Ax = b is x = A~!b, when A is an invertible matrix. 


. The difference matrix A is inverted by the sum matrix $ = Aq}. 


& U N m 


. The cyclic matrix C has no inverse. Its three columns lie in the same plane. 
Those dependent columns add to the zero vector. Cx = 0 has many solutions. 


5. This section is looking ahead to key ideas, not fully explained yet. 


=m WORKED EXAMPLES = 


41.3 A Change the southwest entry a3; of A (row 3, column 1) to a3; = 1: 


1 0090 Xi xı bı 
Ax =b —1 1 0 X2 = —xi + X2 = b2 
i —i 1 X3 Xy—X2 + X3 b3 


Find the solution x for any b. From x = A—'b read off the inverse matrix A7!. 
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Solution Solve the (linear triangular) system Ax = b from top to bottom: 


first x; = bı 1 0 0 by 
then x2 = bı + b2 This says that x = A'p=]!]1 1 0 bz 
then x3 = by + b3 0 1 1 b3 


This is good practice to see the columns of the inverse matrix multiplying b1, b2, and b3. 
The first column of A`! is the solution for b = (1, 0,0). The second column is the solution 
for b = (0, 1,0). The third column x of AT! is the solution for Ax = b = (0,0, 1). 

The three columns of A are still independent. They don’t lie in a plane. The combi- 
nations of those three columns, using the right weights x1, x2, x3, can produce any three- 
dimensional vector b = (b1, b2, b3). Those weights come from x = A~'b. 


1.3B This E is an elimination matrix. E has a subtraction, ÆT? has an addition. 


eo EEIE] ALt 


The first equation is xı = b,. The second equation is x2 — £x; = b2. The inverse will add 
fx, = £b,, because the elimination matrix subtracted £x; : 


_ perl x1 _ by _ 1 0 bi -1 _ 1 0 
cor [h] eala] e 


1.3 C Change C from a cyclic difference to a centered difference producing x3 — x): 


0 1 0 X1 xa 0 bi 
Cx =b —1 0 i X2 = X3 — X1 = bo . (15) 
0-1 0 x3 0 — x2 b3 


Show that Cx = b can only be solved when bı + b3 = 0. That is a plane of vectors b 
in three-dimensional space. Each column of C is in the plane, the matrix has no inverse. 
So this plane contains all combinations of those columns (which are ali the vectors C x). 


Solution The first component of b = Cx is x2, and the last component of b is —x2. 
So we always have b1 + b3 = 0, for every choice of x. 

If you draw the column vectors in C, the first and third columns fall on the same line. 
In fact (column 1) = —(column 3). So the three columns will lie in a plane, and C is not 
an invertible matrix. We cannot solve Cx = b unless b1 + b3 = 0. 

I included the zeros so you could see that this matrix produces “centered differences”. 
Row i of Cx is xj41 (right of center) minus x;-, (left of center). Here is the 4 by 4 
centered difference matrix: 


0 1 00 xy x2 — 0 by 
_ —1 0 1 0 X2 — X3 — X1 _ b2 

Cx =b 0 -l 0 1l X3 a X4 — X2 — b3 (16) 
0 0 —i 0 X4 0 — X3 b4 


Surprisingly this matrix is now invertible! The first and last rows give x2 and x3. Then 
the middle rows give xı and x4. It is possible to write down the inverse matrix C —! But 
5 by 5 will be singular (not invertible) again ... 
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Problem Set 1.3 


1 


Find the linear combination 2s, + 3s. + 483 = b. Then write b as a matrix-vector 
multiplication Sx. Compute the dot products (row of S)» x: 


1 0 0 
s=] l s2=]| 1 s3 = | O | go into the columns of S. 
I 1 1 


Solve these equations S y = b with $1, 82, 53 in the columns of S: 


10 0 {7 y J 10 0]T y l 
1 1 0 yo |=] 1 | and} 1 10 yo} = | 4 
1 1 1 Y3 1 1 1 1 y3 9 


The sum of the first n odd numbers is 


Solve these three equations for y1, y2, y3 in terms of B1, B2, Bs: 


1 0 0 yı By 
Sy = B 1 1 0 y2 = Bz 
1 1 1| B3 


Write the solution y as a matrix A = S7! times the vector B. Are the columns of S 
independent or dependent? 


Find a combination xw; + X2w2 + x3wW3 that gives the zero vector: 


1 4 7 
Ww = w=] 5 w3=]{ 8 
3 6 9 
Those vectors are (independent) (dependent). The three vectors lie in a . The 


matrix W with those columns is not invertible. 


The rows of that matrix W produce three vectors (I write them as columns): 


1 2 3 
ri =| 4 ro =| 5 r3=] 6 
7 8 9 


Linear algebra says that these vectors must also lie in a plane. There must be many 
combinations with yyr, + yoro + y3r3 = 0. Find two sets of y’s. 


Which values of c give dependent columns (combination equals zero)? 
1 3 5 1 0 c 
1 2 4 1 1 0 
l li c 0 1 1I 


U NS 
UW = O 
NAG 
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If the columns combine into Ax = 0 then each row has r ex = 0: 


X1 0 FieXx 0 
a, a2 3 X2 | =| 0 Byrows| r2-x | =| 0 
X3 0 r3°X 0 


The three rows also lie in a plane. Why is that plane perpendicular to x? 


Moving to a 4 by 4 difference equation Ax = b, find the four components x1, x2, 
x3, X4. Then write this solution as x = SB to find the inverse matrix § = A7!: 


1 0 0 0][fm by 
|- 1 0 OF] x2 |_| be J _ 
Ax=1 9 a rolla |=] 03 |=? 
0 0 -1 14] x ba 


What is the cyclic 4 by 4 difference matrix C? It will have 1 and —1 in each row. 
Find all solutions x = (x1, x2, X3, X4) to Cx = 0. The four columns of C lie ina 
“three-dimensional hyperplane” inside four-dimensional space. 


A forward difference matrix A is upper triangular: 


—1 1 0 Z1 22-24 bı 
AZ = 0 -l 1 22 = 23 — 22 = ba =b. 
0 0 -l Z3 0-23 b3 


Find z1, Z2, Z3 from b1, b2, b3. What is the inverse matrix in z = A~!b? 


Show that the forward differences (t + 1)? — t? are 2t+1 = odd numbers. 
As in calculus, the difference (¢ + 1)” — t” will begin with the derivative of t”, 
which is 


The last lines of the Worked Example say that the 4 by 4 centered difference matrix 
in (16) is invertible. Solve Cx = (b1, b2, b3, b4) to find its inverse in x = C~'b. 


Challenge Problems 


The very last words say that the 5 by 5 centered difference matrix is not invertible. 
Write down the 5 equations Cx = b. Find a combination of left sides that gives 
zero. What combination of b1, b2, b3, b4, bs must be zero? (The 5 columns lie on a 
“4-dimensional hyperplane” in 5-dimensional space.) 


If (a,b) is a multiple of (c,d) with abcd # 0, show that (a,c) is a multiple of 
(b,d). This is surprisingly important; two columns are falling on one line. You 
could use numbers first to see how a, b, c, d are related. The question will lead to: 


The matrix A = | h b | has dependent columns when it has dependent rows. 


Chapter 2 


Solving Linear Equations 


2.1 Vectors and Linear Equations 


The central problem of linear algebra is to solve a system of equations. Those equations 
are linear, which means that the unknowns are only multiplied by numbers—we never see 
x times y. Our first linear system is certainly not big. But you will see how far it leads: 


Two equations x — 2y = 1 () 
Two unknowns 3x + 2y il 


We begin a row at a time. The first equation x — 2y = 1 produces a straight line in the xy 
plane. The point x = 1,y = 0 is on the line because it solves that equation. The point 
x = 3, y = Lis also on the line because 3 — 2 = 1. If we choose x = 101 we find y = 50. 


The slope of this particular line is i, because y increases by 1 when x changes by 2. 
But slopes are important in calculus and this is linear algebra! 


y 


3x+2y=11 


x 2y=1 


Figure 2.1: Row picture: The point (3, 1) where the lines meet is the solution. 


Figure 2.1 shows that line x — 2y = 1. The second line in this “row picture” comes 
from the second equation 3x + 2y = 11. You can’t miss the intersection point where the 
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two lines meet. The point x = 3, y = 1 lies on both lines. That point solves both equations 
at once. This is the solution to our system of linear equations. 


‘ROWS The row picture shows two lines meeting at a single point (the solution). 


Turn now to the column picture. I want to recognize the same linear system as a “vector 
equation”. Instead of numbers we need to see vectors. If you separate the original system 
into its columns instead of its rows, you get a vector equation: 


Combination equals b | ; | +y l -5 | = l i | =b. (2) 


This has two column vectors on the left side. The problem is to find the combination of 
those vectors that equals the vector on the right. We are multiplying the first column by x 
and the second column by y, and adding. With the right choices x = 3 and y = 1 (the 
same numbers as before), this produces 3(column 1) + \(column 2) = b. 


p 3(column 1) 


3(column 1) + 1(column 2) = b 


| cotum 1 
3 4 


column 2 


Figure 2.2: Column picture: A combination of columns produces the right side (1,11). 


Figure 2.2 is the “column picture” of two equations in two unknowns. The first part 
shows the two separate columns, and that first column multiplied by 3. This multiplication 
by a scalar (a number) is one of the two basic operations in linear algebra: 


Scalar multiplication | ; | = | o |. 
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If the components of a vector v are vı and v2, then cv has components cv, and cv2. 
The other basic operation is vector addition. We add the first components and the 
second components separately. The vector sum is (1, 11) as desired: 


ees 3 —2 1 
Vector addition | l+ a l=] T |. 


The right side of Figure 2.2 shows this addition. The sum along the diagonal is the vector 
b = (1, 11) on the right side of the linear equations. 

To repeat: The left side of the vector equation is a linear combination of the columns. 
The problem is to find the right coefficients x = 3 and y = 1. We are combining scalar 
multiplication and vector addition into one step. That step is crucially important, because 
it contains both of the basic operations: 

Linear combination 3 =) aa le 


Of course the solution x = 3, y = 1 is the same as in the row picture. I don’t know 
which picture you prefer! I suspect that the two intersecting lines are more familiar at first. 
You may like the row picture better, but only for one day. My own preference is to combine 
column vectors. It is a lot easier to see a combination of four vectors in four-dimensional 
space, than to visualize how four hyperplanes might possibly meet at a point. (Even one 
hyperplane is hard enough. . .) 

The coefficient matrix on the left side of the equations is the 2 by 2 matrix A: 


° e 1 —2 
Coefficient matrix A= | 3 2 | . 
This is very typical of linear algebra, to look at a matrix by rows and by columns. Its rows 
give the row picture and its columns give the column picture. Same numbers, different 
pictures, same equations. We write those equations as a matrix problem Ax = b: 


a On, L3 2 y| (Hup 


The row picture deals with the two rows of A. The column picture combines the columns. 
The numbers x = 3 and y = 1 go into x. Here is matrix-vector multiplication: 


-> Dot products withrows =, |, T1 -2 ]3]_ 1] 
< Combination of columns “77? i l 3 2 il 1 |= | 11 p 


Looking ahead This chapter is going to solve n equations in n unknowns (for any n). 
I am not going at top speed, because smaller systems allow examples and pictures and a 
complete understanding. You are free to go faster, as long as matrix multiplication and 
inversion become clear. Those two ideas will be the keys to invertible matrices. 

I can list four steps to understanding elimination using matrices. 
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1. Elimination goes from A to a triangular U by a sequence of matrix steps E;;. 

2. The inverse matrices E; 1 in reverse order bring U back to the original A. 

3. In matrix language that reverse order is A = LU = (lower triangle) (upper triangle). 
4. Elimination succeeds if A is invertible. (It may need row exchanges.). 


The most-used algorithm in computational science takes those steps (MATLAB calls it lu). 
But linear algebra goes beyond square invertible matrices! For m by n matrices, Ax = 0 
may have many solutions. Those solutions will go into a vector space. The rank of A 
leads to the dimension of that vector space. 

All this comes in Chapter 3, and I don’t want to hurry. But I must get there. 


Three Equations in Three Unknowns 


The three unknowns are x, y, z. We have three linear equations: 


x + 2y + 32 = 6 
Ax =b 2x + 5y + 22 = 4 (3) 
6x — 3y + Zz = 2 


We look for numbers x, y, z that solve all three equations at once. Those desired numbers 
might or might not exist. For this system, they do exist. When the number of unknowns 
matches the number of equations, there is usually one solution. Before solving the problem, 
we visualize it both ways: 


ROW The row picture shows three planes meeting at a single point. 


COLUMN The column picture combines three columns to produce (6, 4, 2). 


In the row picture, each equation produces a plane in three-dimensional space. The first 
plane in Figure 2.3 comes from the first equation x + 2y + 3z = 6. That plane crosses 
the x and y and z axes at the points (6, 0, 0) and (0, 3, 0) and (0, 0, 2). Those three points 
solve the equation and they determine the whole plane. 

The vector (x, y,z) = (0,0, 0) does not solve x + 2y + 3z = 6. Therefore that plane 
does not contain the origin. The plane x + 2y + 3z = 0 does pass through the origin, and 
it is parallel to x + 2y + 3z = 6. When the right side increases to 6, the parallel plane 
moves away from the origin. 

The second plane is given by the second equation 2x + 5y + 2z = 4. It intersects the 
first plane in a line L. The usual result of two equations in three unknowns is a line L of 
solutions. (Not if the equations were x + 2y + 3z = 6 and x + 2y + 3z = 0.) 

The third equation gives a third plane. It cuts the line L at a single point. That point 
lies on all three planes and it solves all three equations. It is harder to draw this triple 
intersection point than to imagine it. The three planes meet at the solution (which we 
haven’t found yet). The column form will now show immediately why z = 2. 
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Z Zz 


2x+5y+2z=4 line L 


0 
Solution | 0 
2 


Y 


plane x +2y +3z=6 3rd plane 6x — 3y +z =2 


(0, 0, 0) is not on these planes 
x x 


Figure 2.3: Row picture: Two planes meet at a line, three planes at a point. 


The column picture starts with the vector form of the equations Ax = b: 


1 2 3 6 
Combine columns x| 2 | +y 5 |+z| 2 |=| 4 |. (4) 
6 —3 1 2 


The unknowns are the coefficients x, y,z. We want to multiply the three column vectors 
by the correct numbers x, y, z to produce b = (6, 4, 2). 


= column 1 


2 
5 | = column 2 
3 


6 
2 times column 3 is b = | 4 }. 
2 


Figure 2.4: Column picture: (x, y,Z) = (0,0, 2) because 2(3, 2, 1) = (6,4, 2) = b. 


Figure 2.4 shows this column picture. Linear combinations of those columns can pro- 
duce any vector b! The combination that produces b = (6,4, 2) is just 2 times the third 
column. The coefficients we need are x = 0, y = 0, and z = 2. 
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The three planes in the row picture meet at that same solution point (0, 0, 2): 


Correct combination l 2 3 6 
o} 2 |+o} 5ļ+2|2ļl=ļ4 
(x,y,z) = (0,0, 2) 6 3 ; 5 


The Matrix Form of the Equations 


We have three rows in the row picture and three columns in the column picture (plus the 
right side). The three rows and three columns contain nine numbers. These nine numbers 
filla3 by 3 matrix A: 


1 2 3 
The “coefficient matrix” in Ax =bis A=| 2 5 2 
6 —3 1 


The capital letter A stands for all nine coefficients (in this square array). The letter 
b denotes the column vector with components 6, 4,2. The unknown x is also a column 
vector, with components x, y, z. (We use boldface because it is a vector, x because it is 
unknown.) By rows the equations were (3), by columns they were (4), and by matrices they 
are (5): 


1 2 3 x 6 
Matrix equation Ax = b 2 5 2 y |= | 4 |. (5) 
6 —3 1 Z 2 


Basic question: What does it mean to “multiply A times x”? We can multiply by rows or 
by columns. Either way, Ax = b must be a correct representation of the three equations. 
You do the same nine multiplications either way. 


Multiplication by rows Ax comes from dot products, each row times the column x: 


(6) 
Multiplication by columns Ax is a combination of column vectors: 


When we substitute the solution x = (0, 0, 2), the multiplication Ax produces b: 


1 2 3 0 6 
2 5 2 O | = 2 times column3 = | 4 
6 —-3 1 2 2 


The dot product from the first row is (1,2,3) « (0,0,2) = 6. The other rows give dot 
products 4 and 2. This book sees Ax as a combination of the columns of A. 
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Example 1 Here are 3 by 3 matrices A and I = identity, with three 1’s and six 0’s: 


1 0 0 4 4 1 0 0 4 4 
Ax =] 1 0 0 5 j=] 4 Ix=]| 0 1 0 5 |=] 5 
1 0 0 6 4 00 1 6 6 


If you are a row person, the dot product of (1, 0, 0) with (4, 5, 6) is 4. If you are a column 
person, the linear combination Ax is 4 times the first column (1, 1, 1). In that matrix A, the 
second and third columns are zero vectors. 

The other matrix J is special. It has ones on the “main diagonal”. Whatever vector 
this matrix multiplies, that vector is not changed. This is like multiplication by 1, but for 
matrices and vectors. The exceptional matrix in this example is the 3 by 3 identity matrix: 


100 CR papain a hn 
I=| 0 1 0 | always-yields the multiplication Ix =x. 
0 0 1 = Pp 


Matrix Notation 


The first row of a 2 by 2 matrix contains a1; and 412. The second row contains a2; and 
a22. The first index gives the row number, so that a;; is an entry in row i. The second index 
j gives the column number. But those subscripts are not very convenient on a keyboard! 
Instead of aj; we type A(i, j). The entry as; = A(5,7) would be in row 5, column 7. 


Ac | ii a | -| A0, D) 40,2) | 
~ Q21 422 ~ A(2, 1) A(2, 2) . 


For an m by n matrix, the row index i goes from 1 to m. The column index j stops at n. 
There are mn entries aj; = A(i, j). A square matrix of order n has n? entries. 
Multiplication in MATLAB 


I want to express A and x and their product Ax using MATLAB commands. This is a first 
step in learning that language. I begin by defining the matrix A and the vector x. This 
vector is a 3 by 1 matrix, with three rows and one column. Enter matrices a row at a time, 
and use a semicolon to signal the end of a row: 


A=[1 2 3; 2 5 2; 6-3 1] 
x =[0; 0; 2] 


Here are three ways to multiply Ax in MATLAB. In reality, A * x is the good way to do it. 
MATLAB is a high level language, and it works with matrices: 


Matrix multiplication. b= A+ x 
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We can also pick out the first row of A (as a smaller matrix!). The notation for that 1 
by 3 submatrix is A(1,:). Here the colon symbol keeps all columns of row 1: 


Row ata time b =[A(l,:)*x; A(2,:)*x; AG,:) *x] 


Each entry is a dot product, row times column, 1 by 3 matrix times 3 by 1 matrix. 

The other way to multiply uses the columns of A. The first column is the 3 by 1 sub- 
matrix A(:,1). Now the colon symbol : is keeping all rows of column 1. This column 
multiplies x(1) and the other columns multiply x(2) and x (3): 


Column at a time b = A(:,1) *x(1) + A(:,2) * x(2) + A(:,3) * x(3) 


I think that matrices are stored by columns. Then multiplying a column at a time will be a 
little faster. So A * x is actually executed by columns. 

You can see the same choice in a FORTRAN-type structure, which operates on single 
entries of A and x. This lower level language needs an outer and inner “DO loop”. When 
the outer loop uses the row number /, multiplication is a row at a time. The inner loop 
J = 1,3 goes along each row J. 

When the outer loop uses J, multiplication is a column at a time. I will do that in 
MATLAB (which really needs two more lines “end” and “end” to close “for i” and “for j”). 


FORTRAN by rows MATLAB by columns 

DO10 J=1,3 for j =1:3 

DO10 J=1,3 fori = 1:3 

10 BU) = BU) + AU, J) * X(J) b(t) = b(i) + AG j)* x(j) 


Notice that MATLAB is sensitive to upper case versus lower case (capital letters and small 
letters). If the matrix is A then its entries are not a(i, j): not recognized. 

I think you will prefer the higher level A * x. FORTRAN won’t appear again in this 
book. Maple and Mathematica and graphing calculators also operate at the higher level. 
Multiplication is A. x in Mathematica. It is multiply(A, x); or equally evalm(A& * x); 
in Maple. Those languages allow symbolic entries a,b, x,... and not only real numbers. 
Like MATLAB’s Symbolic Toolbox, they give the symbolic answer. 


= REVIEW OF THE KEY IDEAS = 


1. The basic operations on vectors are multiplication cv and vector addition v + w. 
2. Together those operations give linear combinations cv + dw. 


3. Matrix-vector multiplication Ax can be computed by dot products, a row at a time. 
But Ax should be understood as a combination of the columns of A. 


4. Column picture: Ax = b asks for a combination of columns to produce b. 


5. Row picture: Each equation in Ax = b gives a line (n = 2) ora plane (n = 3) ora 
“hyperplane” (n > 3). They intersect at the solution or solutions, if any. 
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= WORKED EXAMPLES = 


2.1A Describe the column picture of these three equations Ax = b. Solve by careful 
inspection of the columns (instead of elimination): 


x+3y+2z = —3 1 3 2 x —3 
2x +2y+2z=-2 whichis 22 2 y|=]-2 
3x +5y +6z = —5 3 5 6 Z —5 


Solution The column picture asks for a linear combination that produces b from the 
three columns of A. In this example b is minus the second column. So the solution is 
x =0,y = —1,z = 0. To show that (0, —1, 0) is the only solution we have to know that 
“A is invertible” and “the columns are independent” and “the determinant isn’t zero.” 

Those words are not yet defined but the test comes from elimination: We need 
(and for this matrix we find) a full set of three nonzero pivots. 

Suppose the right side changes to b = (4,4, 8) = sum of the first two columns. Then 
the good combination has x = 1, y = 1, z = 0. The solution becomes x = (1, 1,0). 


2.1B This system has no solution. The planes in the row picture don’t meet at a point. 
No combination of the three columns produces b. How to show this? 


x+3y+5z=4 13 5 x 4 
x+2y—-3z=5 1 2 -3 y |=] 5 | =6 
2x +5y+2z = 8 25 2 Z 8 


(1) Multiply the equations by 1, 1,—1 and add to get 0 = 1. No solution. Are any two of 
the planes parallel? What are the equations of planes parallel to x + 3y + 5z = 4? 


(2) Take the dot product of each column of A (and also b) with y = (1,1,-—1). 
How do those dot products show that the system Ax = b has no solution? 


(3) Find three right side vectors b* and b** and b*** that do allow solutions. 


Solution 
(1) Multiplying the equations by 1, 1, —1 and adding gives 0 = 1: 


x+3y+5z=4 
x+2y—3z=5 

—[2x + 5y + 2z = 8] 
0Ox+0y+0z =1 No Solution 


The planes don’t meet at a point, even though no two planes are parallel. For a plane 
parallel to x + 3y + 5z = 4, change the “4”. The parallel plane x + 3y + 5z = 0 
goes through the origin (0,0,0). And the equation multiplied by any nonzero con- 
stant still gives the same plane, as in 2x + 6y + 10z = 8. 
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(2) The dot product of each column of A with y = (1, 1,—1) is zero. On the right side, 
y +b = (1,1,-1)- (4,5, 8) = 1 is not zero. So a solution is impossible. 


(3) There is a solution when 8 is a combination of the columns. These three choices of 
b have solutions x* = (1,0, 0) and x** = (1,1, 1) and x*** = (0,0,0): 


1 9 0 
b* =| 1 | = firstcolumn b** = | O | = sumofcolumns 5*** =| 0 
2 9 0 
Problem Set 2.1 


Problems 1-8 are about the row and column pictures of Ax = b. 


1 With A = J (the identity matrix) draw the planes in the row picture. Three sides of 
a box meet at the solution x = (x, y, z) = (2,3, 4): 


Ix +0y+0z =2 1 0 Oj} {x 2 
Ox + ly + 0z =3 or 0 1 OJ} y|=]3 
Ox +0y+1z=4 0 0 1 Zz 4 


Draw the vectors in the column picture. Two times column 1 plus three times column 
2 plus four times column 3 equals the right side b. 


2 If the equations in Problem 1 are multiplied by 2, 3, 4 they become DX = B: 


2x +0y+0z=4 2 0 O| |x 4 
Ox + 3y+0z =9 or Dx =/;0 3 O/J|y|/=] 9|=B 
Ox + Oy + 4z = 16 0 0 4] | 2z 16 


Why is the row picture the same? Is the solution X the same as x? What is changed 
in the column picture—the columns or the right combination to give B? 


3 If equation 1 is added to equation 2, which of these are changed: the planes in the 
row picture, the vectors in the column picture, the coefficient matrix, the solution? 
The new equations in Problem 1 would be x = 2, x + y = 5,z = 4. 


4 Find a point with z = 2 on the intersection line of the planes x + y + 3z = 6 and 
x—y-+2z=4. Find the point with z = 0. Find a third point halfway between. 


5 The first of these equations plus the second equals the third: 
x+ y+ z=2 
x+2y+ z=3 

2x + 3y + 2z = 5. 


The first two planes meet along a line. The third plane contains that line, because 
if x, y, Z satisfy the first two equations then they also . The equations have 
infinitely many solutions (the whole line L). Find three solutions on L. 
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6 Move the third plane in Problem 5 to a parallel plane 2x + 3y + 2z = 9. Now the 
three equations have no solution—why not? The first two planes meet along the line 
L, but the third plane doesn’t that line. 


7 In Problem 5 the columns are (1, 1,2) and (1, 2, 3) and (1, 1,2). This is a “singular 
case” because the third column is . Find two combinations of the columns that 
give b = (2,3, 5). This is only possible for b = (4, 6,c) if c = 


8 Normally 4 “planes” in 4-dimensional space meet at a . Normally 4 col- 
umn vectors in 4-dimensional space can combine to produce b. What combination 
of (1, 0,0, 0), (1,1,0,0), (1, 1, 1,0), 1, 1,1, 1) produces b = (3,3,3,2)? What 4 
equations for x, y,Z,¢ are you solving? 


Problems 9-14 are about multiplying matrices and vectors. 


9 Compute each Ax by dot products of the rows with the column vector: 


@ |-2 3 1]}2] © 
“4 iall o1 2aılli 
001 2}\2 


10 Compute each Ax in Problem 9 as a combination of the columns: 


1 2 4 
O(a) becomes Ax =2]/-2/42/3/4+3/1],= 
—4 1 2 


How many separate multiplications for Ax, when the matrix is “3 by 3”? 


11 Find the two components of Ax by rows or by columns: 


EE) JL = 23 yp 


12 Multiply A times x to find three components of Ax: 


00 1 x 2 1 3 1 2 I i 
0 1 0ļjy and 1 2 3 I and 1 2 H . 
1 0 0 Z 3 3 6]|—i 3 3 
13 (a) A matrix with m rows and n columns multiplies a vector with compo- 
nents to produce a vector with components. 
(b) The planes from the m equations Ax = b are in -dimensional space. 


The combination of the columns of A is in -dimensional space. 
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14 Write 2x+3y+z+5t = 8 asa matrix A (how many rows?) multiplying the column 
vector x = (x, y,Z,t) to produce b. The solutions x fill a plane or “hyperplane” 
in 4-dimensional space. The plane is 3-dimensional with no 4D volume. 


Problems 15-22 ask for matrices that act in special ways on vectors. 

15 (a) What is the 2 by 2 identity matrix? J times [ } | equals | ¥ ]. 
(b) What is the 2 by 2 exchange matrix? P times [y] equals [3]. 

16 (a) What 2 by 2 matrix R rotates every vector by 90°? R times [ } | is [-}]. 
(b) What 2 by 2 matrix R? rotates every vector by 180°? 


17 Find the matrix P that multiplies (x, y, z) to give (y, z, x). Find the matrix Q that 
multiplies (y, z, x) to bring back (x, y, z). 


18 What 2 by 2 matrix E subtracts the first component from the second component? 
What 3 by 3 matrix does the same? 


3 3 
E B = B and E|5)=)2 
7 7 


19 What 3 by 3 matrix £ multiplies (x, y, z) to give (x, y, z + x)? What matrix £~! 
multiplies (x, y, Z) to give (x, y,z — x)? If you multiply (3, 4,5) by E and then 
multiply by E~', the two results are ( ) and ( ). 


20 What 2 by 2 matrix Pı projects the vector (x, y) onto the x axis to produce (x, 0)? 
What matrix Pz projects onto the y axis to produce (0, y)? If you multiply (5, 7) 
by Pı and then multiply by P2, you get ( ) and ( ). 


21 What 2 by 2 matrix R rotates every vector through 45°? The vector (1,0) goes to 
(/2/2, /2/2). The vector (0, 1) goes to (—./2/2, 2/2). Those determine the 
matrix. Draw these particular vectors in the xy plane and find R. 


22 Write the dot product of (1,4,5) and (x, y, Z) as a matrix multiplication Ax. The 
matrix A has one row. The solutions to Ax = 0 lie ona perpendicular to the 
vector . The columns of A are only in -dimensional space. 


23 In MATLAB notation, write the commands that define this matrix A and the column 
vectors x and b. What command would test whether or not Ax = b? 


Ey sE M 


24 The MATLAB commands A = eye(3) and v = [3:5]’ produce the 3 by 3 identity 
matrix and the column vector (3, 4,5). What are the outputs from A*v and v’*v? 
(Computer not needed!) If you ask for vxA, what happens? 
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25 Ifyou multiply the 4 by 4 all-ones matrix A = ones(4) and the column v = ones(4,1), 
what is Axv? (Computer not needed.) If you multiply B = eye(4) + ones(4) times 
w = zeros(4,1) + 2*ones(4,1), what is Bw? 


Questions 26-28 review the row and column pictures in 2, 3, and 4 dimensions. 


26 Draw the row and column pictures for the equations x — 2y = 0,x + y = 6. 


27 Fortwo linear equations in three unknowns x, y, Z, the row picture will show (2 or 3) 
(lines or planes) in (2 or 3)-dimensional space. The column picture is in (2 or 3)- 
dimensional space. The solutions normally lie on a 


28 For four linear equations in two unknowns x and y, the row picture shows four 
. The column picture is in -dimensional space. The equations have no 
solution unless the vector on the right side is a combination of 


29 Start with the vector wg = (1,0). Multiply again and again by the same “Markov 
matrix” A = [.8 .3; .2 .7]. The next three vectors are #1, “2, 43: 


“y= È 3 [ol = [5] t = Au, = _ ss W= AQ = ____.. 
What property do you notice for all four vectors tto, #1, U2, U3? 
Challenge Problems 
30 Continue Problem 29 from up = (1,0) to u7, and also from vo = (0,1) to v7. 


What do you notice about u7 and v7? Here are two MATLAB codes, with while and 
for. They plot uo to u7 and vo to v7. You can use other languages: 


u=[1 ; 0}; A =[.8 .3 ;.2 .7]; v=[0;1];A=[8.3; .2.7]; 
x=u;k =[0: 7]; x=v;k=[0: 7]; 
while size(x,2) <= 7 forj=1:7 
u = A*u; x = [x u]; v = Axv; X = [x v]} 
end end 
plot(k, x) plot(k, x) 


The u’s and v’s are approaching a steady state vector s. Guess that vector and check 
that As = s. If you start with s, you stay with s. 


31 Invent a 3 by 3 magic matrix M3 with entries 1,2,...,9. All rows and columns 
and diagonals add to 15. The first row could be 8,3, 4. What is M3 times (1,1, 1)? 
What is M4 times (1,1, 1, 1) if a 4 by 4 magic matrix has entries 1,..., 16? 


32 Suppose u and v are the first two columns of a 3 by 3 matrix A. Which third columns 
w would make this matrix singular? Describe a typical column picture of Ax = b 
in that singular case, and a typical row picture (for a random b). 
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Multiplying by A is a “linear transformation”. Those important words mean: 


If w is a combination of u and v, then Aw is the same combination of Au and Av. 


It is this “linearity” Aw = cAu + dAv that gives us the name linear algebra. 


Problem: If u = | F | and v = | A | then Au and Av are the columns of A. 


Combine w = cu + dv. If w = | > l how is Aw connected to Au and Av? 


Start from the four equations —x;+1 + 2x; — xi—1 = i (fori = 1,2,3,4 with 
Xo = X5 = 0). Write those equations in their matrix form Ax = b. Can you solve 
them for x1, X2, X3, X4? 


A9by9 Sudoku matrix S has the numbers 1,...,9 in every row and column, and 
in every 3 by 3 block. For the all-ones vector x = (1,..., 1), what is Sx? 


A better question is: Which row exchanges will produce another Sudoku matrix? 
Also, which exchanges of block rows give another Sudoku matrix? 


Section 2.7 will look at all possible permutations (reorderings) of the rows. I can see 
6 orders for the first 3 rows, all giving Sudoku matrices. Also 6 permutations of the 
next 3 rows, and of the last 3 rows. And 6 block permutations of the block rows? 
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2.2 The Idea of Elimination 


This chapter explains a systematic way to solve linear equations. The method is called 
“elimination”, and you can see it immediately in our 2 by 2 example. Before elimination, 
x and y appear in both equations. After elimination, the first unknown x has disappeared 
from the second equation 8y = 8: 


x-2y=1 = 1 (multiply equation 1 by 3) 
Before 3x +2y = 11 After = 8 (subtract to eliminate 3x) 


The new equation 8y = 8 instantly gives y = 1. Substituting y = 1 back into the first 
equation leaves x — 2 = 1. Therefore x = 3 and the solution (x, y) = (3, 1) is complete. 


Elimination produces an upper triangular system—this is the goal. The nonzero 
coefficients 1,—2,8 form a triangle. That system is solved from the bottom upwards— 
first y = 1 and then x = 3. This quick process is called back substitution. It is used for 
upper triangular systems of any size, after elimination gives a triangle. 

Important point: The original equations have the same solution x = 3 and y = 1. 
Figure 2.5 shows each system as a pair of lines, intersecting at the solution point (3, 1). 
After elimination, the lines still meet at the same point. Every step worked with correct 
equations. 


How did we get from the first pair of lines to the second pair? We subtracted 3 times 
the first equation from the second equation. The step that eliminates x from equation 2 is 
the fundamental operation in this chapter. We use it so often that we look at it closely: 


To eliminate x: Subtract a multiple of equation 1 from equation 2. 


Three times x — 2y = 1 gives 3x — 6y = 3. When this is subtracted from 3x + 2y = 11, 
the right side becomes 8. The main point is that 3x cancels 3x. What remains on the left 
side is 2y — (—6y) or 8y, and x is eliminated. The system became triangular. 

Ask yourself how that multiplier £ = 3 was found. The first equation contains 1x. 
So the first pivot was 1 (the coefficient of x). The second equation contains 3x, so the 
multiplier was 3. Then subtraction 3x — 3x produced the zero and the triangle. 


y 


y 


After elimination 


3x + 2y= 11 
Before elimination 


x—-—2y=1 


Figure 2.5: Eliminating x makes the second line horizontal. Then 8y = 8 gives y = 1. 
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You will see the multiplier rule if I change the first equation to 4x — 8y = 4. (Same 
straight line but the first pivot becomes 4.) The correct multiplier is now £ = å, To find the 
multiplier, divide the coefficient “ 3” to be eliminated by the pivot “ 4”: 


4x- 8y =4 Muitiply equation 1 by 3 4x — 8y = 4 
3x+2y = l1 Subtract from equation 2 8y|= 8. 
The final system is triangular and the last equation still gives y = 1. Back substitution 


produces 4x — 8 = 4 and 4x = 12 and x = 3. We changed the numbers but not the lines 
or the solution. Divide k the pivot to find that multiplier £ = 


-Pivot [= -first nonzero. in the row that does the elimination: 
= Multiplier = < (entry to: eliminate) divided. by: (pivot) = | 


The new second equation starts with the second pivot, which is 8. We would use it to 
eliminate y from the third equation if there were one. To solve n equations we want n 
pivots. The pivots are on the diagonal of the triangle after elimination. 

You could have solved those equations for x and y without reading this book. It is an 
extremely humble problem, but we stay with it a little longer. Even for a 2 by 2 system, 
elimination might break down. By understanding the possible breakdown (when we can’t 
find a full set of pivots), you will understand the whole process of elimination. 


Breakdown of Elimination 


Normally, elimination produces the pivots that take us to the solution. But failure is possi- 
ble. At some point, the method might ask us to divide by zero. We can’t do it. The process 
has to stop. There might be a way to adjust and continue—or failure may be unavoidable. 


Example 1 fails with no solution to Oy = 8. Example 2 fails with too many solutions to 
Oy = 0. Example 3 succeeds by exchanging the equations. 


Example 1 Permanent failure with no solution. Elimination makes this clear: 


x-—2y=1 Subtract 3 times 
3x — 6y =1]1 egn. 1 from eqn. 2 


There is no solution to Oy = 8. Normally we divide the tight : side 8 8 by the second pivot, 
but this system has no second pivot. (Zero is never allowed as a pivot!) The row and 
column pictures in Figure 2.6 show why failure was unavoidable. If there is no solution, 
elimination will discover that fact by reaching an equation like Oy = 8. 

The row picture of failure shows parallel lines—which never meet. A solution must lie 
on both lines. With no meeting point, the equations have no solution. 

The column picture shows the two columns (1,3) and (—2, —6) in the same direction. 
All combinations of the columns lie along a line. But the column from the right side is in 
a different direction (1, 11). No combination of the columns can produce this right side— 
therefore no solution. 

When we change the right side to (1, 3), failure shows as a whole line of solution points. 
Instead of no solution, next comes Example 2 with infinitely many. 
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first B 
column | 3 


Columns don’t combine to give b = fi 


second 2] 


3x- 6y=11 column |-6 


Figure 2.6: Row picture and column picture for Example 1: no solution. 


Example 2 Failure with infinitely many solutions. Change b = (1, 11) to (1,3). 


x—2y=l1l Subtract 3 times 
3x —6y =3 eqn. 1 from eqn. 2 


Still only 
one pivot. 


Every y satisfies Oy = 0. There is really only one equation x — 2y = 1. The unknown y 
is “free”. After y is freely chosen, x is determined as x = 1 + 2y. 

In the row picture, the parallel lines have become the same line. Every point on that 
line satisfies both equations. We have a whole line of solutions in Figure 2.7. 

In the column picture, b = (1, 3) is now the same as column 1. So we can choose 
x = l and y = 0. We can also choose x = Q and y = —ż; column 2 times -4 equals b. 
Every (x, y) that solves the row problem also solves the column problem. 


Failure Forn equations we do not get n pivots 

Elimination leads to an equation 0 Æ 0 (no solution) or 0 = 0 (many solutions) 

Success comes with n pivots. But we may have to exchange the n equations. 
Elimination can go wrong'in a third way—but this time it can be fixed. Suppose the first 


pivot position contains zero. We refuse to allow zero as a pivot. When the first equation 
has no term involving x, we can exchange it with an equation below: 


Example 3 Temporary failure (zero in pivot). A row exchange produces two pivots: 


Ox+2y=4 Exchange the 


Permutation . 
3x—2y=5 two equations 


The new system is already triangular. This small example is ready for back substitution. 
The last equation gives y = 2, and then the first equation gives x = 3. The row picture is 
normal (two intersecting lines). The column picture is also normal (column vectors not in 
the same direction). The pivots 3 and 2 are normal—but a row exchange was required. 
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1 
right hand side H 


lies on the line of columns 


Same line from both equations 
Solutions all along this line 


1 _ l 
z (second column) = — l 3 


Figure 2.7: Row and column pictures for Example 2: infinitely many solutions. 


Examples 1 and 2 are singular—there is no second pivot. Example 3 is nonsingular — 
there is a full set of pivots and exactly one solution. Singular equations have no solution or 
infinitely many solutions. Pivots must be nonzero because we have to divide by them. 


Three Equations in Three Unknowns 


To understand Gaussian elimination, you have to go beyond 2 by 2 systems. Three by three 
is enough to see the pattern. For now the matrices are square—an equal number of rows 
and columns. Here is a 3 by 3 system, specially constructed so that all steps lead to whole 
numbers and not fractions: 


2x +4y —2z =2 
4x + 9y —3z =8 (1) 
—2x — 3y +7z = 10 
What are the steps? The first pivot is the boldface 2 (upper left). Below that pivot we want 


to eliminate the 4. The first multiplier is the ratio 4/2 = 2. Multiply the pivot equation by 
£21 = 2 and subtract. Subtraction removes the 4x from the second equation: 


Step 1 Subtract 2 times equation 1 from equation 2. This leaves y + z = 4. 


We also eliminate —2x from equation 3—still using the first pivot. The quick way is to add 
equation 1 to equation 3. Then 2x cancels —2x, We do exactly that, but the rule in this book 
is to subtract rather than add. The systematic pattern has multiplier 23; = —2/2 = —1. 
Subtracting —1 times an equation is the same as adding: 


Step 2 Subtract —1 times equation 1 from equation 3. This leaves y + 5z = 12. 
The two new equations involve only y and z. The second pivot (in boldface) is 1: 
sae ly+l1lz=4 
x is eliminated ly + 5z = 12 


We have reached a 2 by 2 system. The final step eliminates y to make it 1 by 1: 
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Step 3 Subtract equation 2new from 3new. The multiplier is 1/1 = 1. Then 4z = 8. 


The original Ax = b has been converted into an upper triangular Ux =c: 
oy + dy - 22 =2 ee oe 2x + 4y — 27 =2- 


— Ax + 9y—3z = 8 : nas become ly+lz=4 ` (2) 
(2x 3y +72 = 10 Ue eo: | 


The goal is achieved—forward elimination i is s complete from A to v. Notice the pivots 
2,1,4 along the diagonal of U. The pivots 1 and 4 were hidden in the original system. 
Elimination brought them out. Ux = ¢ is ready for back substitution, which is quick: 


(4z=8 gives z=2) (y+z=4 gives y=2) (equation! gives x =-—1) 


The solution is (x, y,z) = (—1,2,2). The row picture has three planes from three equa- 
tions. All the planes go through this solution. The original planes are sloping, but the last 
plane 4z = 8 after elimination is horizontal. 

The column picture shows a combination Ax of column vectors producing the right 
side b. The coefficients in that combination are —1, 2, 2 (the solution): 


2 4 —2 2 
Ax = (—1)| 4{+2] 9|+2ļ|-—3 | equals 8| =b. (3) 
—2 —3 7 10 


The numbers x, y, z multiply columns 1, 2,3 in Ax = b and also in the triangular Ux = c. 
For a 4 by 4 problem, or an n by n problem, elimination proceeds the same way. Here 
is the whole idea, column by column from A to U, when elimination succeeds. 


Column 1. Use the first equation to create zeros below the first pivot. 
Column 2. Use the new equation 2 to create zeros below the second pivot. 
Columns 3 to n. Keep going to find all n pivots and the triangular U. 


xXx x x x 


x 
x x 
After column 2 we have We want x 


(4) 


BS Se MOM 
Be oo S a E a 
Ca MR a R a R a 


0 x 
0 0 
0 0 


The result of forward elimination is an upper triangular system. It is nonsingular if there 
is a full set of n pivots (never zero!). Question: Which x on the left could be changed 
to boldface x because the pivot is known? Here is a final example to show the original 
Ax = b, the triangular system Ux = c, and the solution (x, y, z) from back substitution: 


x+ y+ z=6 x+yt+z=6 x 3 Back 
x+2y+2z=9 Forward y+z=3 yj=|2 Back 
x+2y+3z=10 Forward z=1 z 1 


All multipliers are 1. All pivots are 1. All planes meet at the solution (3, 2, 1). The columns 
of A combine with 3, 2, 1 to give b = (6, 9, 10). The triangle shows Ux = e = (6, 3, 1). 
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= REVIEW OF THE KEY IDEAS = 


1. A linear system (Ax = b) becomes upper triangular (Ux = c) after elimination. 


2. We subtract £;; times equation j from equation i, to make the (i, j) entry zero. 


3. The multiplier is €;; = ent a IOWI Pivots can not be zero! 


4. A zero in the pivot position can be repaired if there is a nonzero below it. 
5. The upper triangular system is solved by back substitution (starting at the bottom). 


6. When breakdown is permanent, the system has no solution or infinitely many. 


= WORKED EXAMPLES =" 


2.2A When elimination is applied to this matrix A, what are the first and second pivots? 
What is the multiplier £2, in the first step (£21 times row 1 is subtracted from row 2)? 


A has a first difference in row 1 and a second difference —1,2, —1 in row 2. 


l -l 0 1 -l 0 1 —l1 0 
A=| -l 2-1 | —]| 0 1 -1 |—U=!| 0 l -l 
0 -l 2 0 —l 2 0 0 1 


What entry in the 2,2 position (instead of 2) would force an exchange of rows 2 and 3? 
Why is the lower left multiplier £31 = 0, subtracting zero times row 1 from row 3? 
If you change the corner entry from a33 = 2 to a33 = 1, why does elimination fail? 


Solution The first pivot is 1. The multiplier £2; is —1/1 = —1. When —1 times row 1 
is subtracted (so row 1 is added to row 2), the second pivot is revealed as 1. 

If we reduce the middle entry “2” to “1”, that would force a row exchange. (Zero will 
appear in the second pivot position.) The multiplier 23; is zero because a3; = 0. A zero at 
the start of a row needs no elimination. This A is a “band matrix”. 

The last pivot is 1. So if the original corner entry a33 is reduced by 1 (to a33 = 1), 
elimination would produce 0. No third pivot, elimination fails. 


2.2B Suppose 4 is already a triangular matrix (upper triangular or lower triangular). 
Where do you see its pivots? When does Ax = b have exactly one solution for every b? 


Solution The pivots of a triangular matrix are already set along the main diagonal. Elim- 
ination succeeds when all those numbers are nonzero. Use back substitution when A is 
upper triangular, go forward when A is lower triangular. 
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2.2C Use elimination to reach upper triangular matrices U. Solve by back substitution 
or explain why this is impossible. What are the pivots (never zero)? Exchange equations 
when necessary. The only difference is the —x in the last equation. 


Success x+ty+z=7 x+y+z=7 
then xty-z=5 xXx+y-z=5 
Failure x-ytz=3 —x-y+z=3 


Solution For the first system, subtract equation 1 from equations 2 and 3 (the multipliers 
are £9, = 1 and £3; = 1). The 2, 2 entry becomes zero, so exchange equations: 


x+y+z= 7 x+ytz= 7 
Success 0y —2z = —2  exchangesinto —2y + 0z = —4 


Then back substitution gives z = 1 and y = 2 and x = 4. The pivots are 1, —2, —2. 
For the second system, subtract equation 1 from equation 2 as before. Add equation 1 
to equation 3. This leaves zero in the 2, 2 entry and also below: 


x+y+z= 7 There is no pivot in column 2 (it was — column 1) 
Failure Oy — 2z = -2 A further elimination step gives 0z = 8 
Oy +2z= 10 The three planes don’t meet 


Plane 1 meets plane 2 in a line. Plane 1 meets plane 3 in a parallel line. No solution. 
If we change the “3” in the original third equation to “—5” then elimination would lead 
to 0 = 0. There are infinitely many solutions! The three planes now meet along a whole line. 
Changing 3 to —5 moved the third plane to meet the other two. The second equation 
gives z = 1. Then the first equation leaves x + y = 6. No pivot in column 2 makes y 
free (it can have any value). Then x = 6 — y. 


Problem Set 2.2 


Problems 1-10 are about elimination on 2 by 2 systems. 
1 What multiple £2; of equation | should be subtracted from equation 2? 
2x+3y=1 
10x + 9y = 11. 


After this elimination step, write down the upper triangular system and circle the two 
pivots. The numbers 1 and 11 have no influence on those pivots. 


2 Solve the triangular system of Problem 1 by back substitution, y before x. Verify 
that x times (2,10) plus y times (3,9) equals (1, 11). If the right side changes to 
(4, 44), what is the new solution? 


52 


10 


Chapter 2. Solving Linear Equations 


What multiple of equation 1 should be subtracted from equation 2? 
2x —4y =6 
—x+5y =0. 


After this elimination step, solve the triangular system. If the right side changes to 
(—6, 0), what is the new solution? 


What multiple £ of equation 1 should be subtracted from equation 2 to remove c? 
ax+by =f 
ex+dy=g. 


The first pivot is a (assumed nonzero). Elimination produces what formula for the 
second pivot? What is y? The second pivot is missing when ad = be: singular. 


Choose a right side which gives no solution and another right side which gives in- 
finitely many solutions. What are two of those solutions? 


3x +2y = 10 


Singular system 6x + 4y = 


Choose a coefficient 6 that makes this system singular. Then choose a right side g 
that makes it solvable. Find two solutions in that singular case. 


2x +by = 16 
4x + 8y =g. 
For which numbers a does elimination break down (1) permanently (2) temporarily? 
ax + 3y = —3 
4x +6y = 6. 
Solve for x and y after fixing the temporary breakdown by a row exchange. 


For which three numbers k does elimination break down? Which is fixed by a row 
exchange? In each case, is the number of solutions 0 or 1 or co? 
kx+3y= 6 
3x + ky = —6. 
What test on bı and b2 decides whether these two equations allow a solution? How 
many solutions will they have? Draw the column picture for b = (1,2) and (1, 0). 
3x —2y = by 
6x — 4y = do. 
In the xy plane, draw the lines x + y = 5 and x + 2y = 6 and the equation 


y= that comes from elimination. The line 5x — 4y = c will go through the 
solution of these equations if c = 
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Problems 11-20 study elimination on 3 by 3 systems (and possible failure). 


11 


12 


13 


14 


15 


16 


17 


(Recommended) A system of linear equations can’t have exactly two solutions. Why? 


(a) If (x, y,z) and (X, Y, Z) are two solutions, what is another solution? 
(b) If 25 planes meet at two points, where else do they meet? 


Reduce this system to upper triangular form by two row operations: 
2x+3y+z = 8 
4x + 7y + 5z = 20 
—-2y+2z= 0. 
Circle the pivots. Solve by back substitution for Z, y, x. 
Apply elimination (circle the pivots) and back substitution to soive 
2x — 3y =3 
4x- 5y + z=7 
2x— y—3z =5. 


List the three row operations: Subtract times row from row 


Which number d forces a row exchange, and what is thé triangular system (not sin- 
gular) for that d? Which d makes this system singular (no third pivot)? 
2x+5y+2z=0 
4x+dy+z=2 
y-z=3. 


Which number b leads later to a row exchange? Which b leads to a missing pivot? 
In that singular case find a nonzero solution x, y, z. 


x + by = 0 
x—-2y-—z=0 
y+z=0. 


(a) Construct a 3 by 3 system that needs two row exchanges to reach a triangular 
form and a solution. 


(b) Construct a 3 by 3 system that needs a row exchange to keep going, but breaks 
down later. 


If rows 1 and 2 are the same, how far can you get with elimination (allowing row 
exchange)? If columns 1 and 2 are the same, which pivot is missing? 


Equal 2x— y+z=0 2x+2y+z=0 Equal 
rows 2x—y+z=0 4x+4y+z=0 columns 
4x+y+z2=2 6x + 6y+2=2. 
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Construct a 3 by 3 example that has 9 different coefficients on the left side, but 
rows 2 and 3 become zero in elimination. How many solutions to your system with 
b = (1, 10, 100) and how many with b = (0, 0,0)? 


Which number g makes this system singular and which right side ¢ gives it infinitely 
many solutions? Find the solution that has z = 1. 


x+4y-—2z=1 
x+7Ty-—6z =6 
3y + qz =t. 


Three planes can fail to have an intersection point, even if no planes are parallel. The 
system is singular if row 3 of Aisa of the first two rows. Find a third equation 
that can’t be solved together with x + y + z = 0 and x — 2y -z = 1. 


Find the pivots and the solution for both systems (Ax = b and Kx = b): 


2x+ y =0 2x— y =0 
x+2y+ 2 =0 -x +2y— 2 =0 
y +2z+ t=0 — y+2z— t =0 
z+2t=5 — Zz +2t =5. 


If you extend Problem 21 following the 1,2, 1 pattern or the —1, 2, —1 pattern, what 
is the fifth pivot? What is the nth pivot? K is my favorite matrix. 


If elimination leads to x + y = 1 and 2y = 3, find three possible original problems. 
For which two numbers a will elimination fail on A = [22]? 


For which three numbers a will elimination fail to give three pivots? 


a 2 3 
A=|a a 4} issingular for three values of a. 
aaa 


Look for a matrix that has row sums 4 and 8, and column sums 2 and s: 


Matrix = |2 b|. a+b=4 a+c=2 
~ ic d c+d=8 b+d=s 
The four equations are solvable only if s = . Then find two different matrices 


that have the correct row and column sums. Extra credit: Write down the 4 by 4 
system Ax = b with x = (a,b,c,d) and make A triangular by elimination. 


Elimination in the usual order gives what matrix U and what solution to this “lower 
triangular” system? We are really solving by forward substitution: 


3x =3 
6x + 2y =8 
9x —2y +z =9. 
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Create a MATLAB command A(2,:) =... for the new row 2, to subtract 3 times row 
1 from the existing row 2 if the matrix A is already known. 


Challenge Problems 


Find experimentally the average Ist and 2nd and 3rd pivot sizes from MATLAB ’s 
[L,U] = lu(rand(3)). The average size abs(U(1, 1)) is above l because lu picks 
the largest available pivot in column 1. Here A = rand(3) has random entries 
between 0 and 1. 


If the last corner entry is A(5,5) = 11 and the last pivot of A is U(5, 5) = 4, what 
different entry A (5, 5) would have made A singular? 


Suppose elimination takes A to U without row exchanges. Then row j of U isa 
combination of which rows of A? If Ax = 0,is Ux = 0? If Ax = b6,is Ux = b? 
If A starts out lower triangular, what is the upper triangular U? 


Start with 100 equations Ax = 0 for 100 unknowns x = (X1,...,X109). Suppose 
elimination reduces the 100th equation to 0 = 0, so the system is “singular”. 


(a) Elimination takes linear combinations of the rows. So this singular system has 
the singular property: Some linear combination of the 100 rows is 


(b) Singular systems Ax = 0 have infinitely many solutions. This means that some 
linear combination of the 100 columns is 


(c) Invent a 100 by 100 singular matrix with no zero entries. 


(d) For your matrix, describe in words the row picture and the column picture of 
Ax = 0. Not necessary to draw 100-dimensional space. 
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2.3 Elimination Using Matrices 


We now combine two ideas—elimination and matrices. The goal is to express all the steps 
of elimination (and the final result) in the clearest possible way. In a 3 by 3 example, 
elimination could be described in words. For larger systems, a long list of steps would be 
hopeless. You will see how to subtract a multiple of row j from row i—using a matrix E. 


The 3 by 3 example in the previous section has the beautifully short form Ax = b: 


2x1, + 4x2 — 2x3 = 2 2 4 -2 x1 2 
4x, + 9x2 — 3x3 = 8 isthe same as 4 9 -3 x=] 8]. d) 
—2x1 — 3x2 + 7x3 = 10 —2 —3 7 X3 10 


The nine numbers on the left go into the matrix A. That matrix not only sits beside x, it 
multiplies x. The rule for “A times x” is exactly chosen to yield the three equations. 


Review of A times x. A matrix times a vector gives a vector. The matrix is square when 
the number of equations (three) matches the number of unknowns (three). Our matrix is 
3 by 3. A general square matrix is n by n. Then the vector x is in n-dimensional space. 


Xi —| 
The unknown in R? is x = | x2 and the solutionis x =| 2 
X3 2 


Key point: Ax = b represents the row form and also the column form of the equations. 


2 4 —2 2 
Column form Ax =(-1)} 4/42] 9)42)-3)] =] 8] =b. 
—2 —3 7 10 


This rule for Axi is used so often that we express ito once more for r emphasis, 


Ar is a a combination of the columns af, A, ( 


a “Ax = x times s (column D- H + Xn times (column n). 


When v we ‘compute the components of Ax, v we use e the 1 row , form of n matrix multiplica- 
tion. The ith componént is a dot product with row i of A, which is [a;i ai2 ... Gin]. 
The short formula for that dot product with x uses “sigma notation”. 


Components of Ax are dot products with rows of A. 


Äx is ajx + aj2X2 He + Ginn 


The sigma symbol }_ is an instruction to add.! Start with j = 1 and stop with j = n. 
Start the sum with aj;,x; and stop with a;,x,. That produces (row i) +x. 


l Einstein shortened this even more by omitting the X`. The repeated j in a; j Xj automatically meant addition. 


He also wrote the sum as a 7 xj. Not being Einstein, we include the È. 
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One point to repeat about matrix notation: The entry in row 1, column 1 (the top left 
comer) is a11. The entry in row 1, column 3 is a3. The entry in row 3, column 1 is a3}. 
(Row number comes before column number.) The word “entry” for a matrix corresponds 
to “component” for a vector. General rule: aj; = A(i, j ) isin row i, column j. 


Example 1 This matrix has aj; = 2i + j. Then a}; = 3. Also ayz2 = 4 and a21 = 5. 
Here is Ax with numbers and letters: 


3 4 2 _ 3.24 4-1 ii 412 Xı | _ |&11x1 + 412X2 
5 6 1} |5-2+6-1 421; a2 x2 ~ a21 X1 +€22X2 |` 
The first component of Ax is 6 + 4 = 10. A row times a column gives a dot product. 


The Matrix Form of One Elimination Step 


Ax = b is a convenient form for the original equation. What about the elimination steps? 
The first step in this example subtracts 2 times the first equation from the second equation. 
On the right side, 2 times the first component of b is subtracted from the second component: 


2 2 
First step b=] 8 changes to bnew =| 4 
10 10 


We want to do that subtraction with a matrix! The same result bnew = Ed is achieved 
when we multiply an “elimination matrix” E times b. It subtracts 2b; from bo: 


={-—2 1 0 


Multiplication by £ subtracts 2 times row 1 from row 2. Rows 1 and 3 stay the same: 


1 Q 0 2 2 1 0 Of] | b by 
—2 I O 8f=| 4 —2 1 Of | d2] =] 2-2, 
0 0 1 10 10 0 O0 1 b3 b3 


The first and third rows of E are rows from the identity matrix 7. The new second compo- 
nent is the number 4 that appeared after the elimination step. This is b2 — 2). 

It is easy to describe the “elementary matrices” or “elimination matrices” like this £. 
Start with the identity matrix J. Change one of its zeros to the multiplier —€: 
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Example 2 The matrix E3; has —€ in the 3, 1 position: 


1 0 0 1 0O 0 
Identity /=|0 1 Ọ Elimination £3,;={| 0 1 0 
0 0 | —-£ 0 1 


When you multiply J times b, you get b. But £3; subtracts £ times the first component 
from the third component. With £ = 4 this example gives 9 — 4 = 5: 


1 0 0 1 1 1 0O 0 1 l 
Ib=|O 1 0 3|=]|3 and Fb=| 0 I 0ù/|3]=]3 
00 1 9 9 —4 0 1 9 5 


What about the left side of Ax = b? Both sides are multiplied by E31. The purpose of 
E31 is to produce a zero in the (3, 1) position of the matrix. 

The notation fits this purpose. Start with A. Apply E’s to produce zeros below the 
pivots (the first E is E21). End with a triangular U. We now look in detail at those steps. 

First a small point. The vector x stays the same. The solution is not changed by 
elimination. (That may be more than a small point.) It is the coefficient matrix that is 
changed. When we start with Ax = b and multiply by E, the result is FAx = Eb. 
The new matrix ÆA is the result of multiplying E times A. 


Confession The elimination matrices Ejj are great examples, but you won’t see them 
later. They show how a matrix acts on rows. By taking several elimination steps, we will 
see how to multiply matrices (and the order of the E’s becomes important). Products and 
inverses are especially clear for E’s. It is those two ideas that the book will now use. 


Matrix Multiplication 


The big question is: How do we multiply two matrices? When the first matrix is £, 
we already know what to expect for EA. This particular E subtracts 2 times row 1 from 
row 2 of this matrix A and any matrix. The multiplier is £ = 2: 


1 0 0O 2 4 -2 2 4-2 
EA=|-2 1 0 4 9 -3/;=|] 0 1 1 (with the zero). (2) 
0 0 1],;{-2 -3 7 —2 —3 7 


This step does not change rows 1 and 3 of A. Those rows are unchanged in E A—only 
row 2 is different. Twice the first row has been subtracted from the second row. Matrix 
multiplication agrees with elimination—and the new system of equations is EAx = Eb. 

EAx is simple but it involves a subtle idea. Start with Ax = b. Multiplying both 
sides by E gives E(Ax) = Eb. With matrix multiplication, this is also (EA)x = Eb. 
The first was E times Ax, the second is EA times x. They are the same. Parentheses 
are not needed. We just write EF Ax. 

That rule extends to a matrix C with several column vectors like C = [e, cz ¢3]. When 
multiplying EAC, you can do AC first or EA first. This is the point of an “associative 
law” like 3 x (4 x 5) = (3 x 4) x 5. Multiply 3 times 20, or multiply 12 times 5. Both 
answers are 60. That law seems so clear that it is hard to imagine it could be false. 
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The “commutative law” 3 x 4 = 4 x 3 looks even more obvious. But EA is usually 
different from AE. When E multiplies on the right, it acts on the columns of A. 


| ABO) = (ABYC. BES Oa 
Of Often AB BE BA 


There is another requirement on matrix multiplication. Suppose B has only one column 
(this column is 6). The matrix-matrix law for EB should agree with the matrix-vector 
law for Eb. Even more, we should be able to multiply matrices EB a column at a time: 


If B has several columns b1, b2, b3, then the columns of EB are Ebi, Eb2, Eb3. 


a AB = = A [b1 bz bs] = [Ab; Abz Abs). oO 


This holds true for the matrix multiplication in (2). If you multiply column 3 of A by 
E, you correctly get column 3 of EA: 


1 0 Of }-2 —2 
—2 1 Oj;;-3}=] 1 E(column j of A) = column j of EA. 
0 0 1 7 7 


This requirement deals with columns, while elimination is applied to rows. Fhe next 
section describes each entry of every product AB. The beauty of matrix multiplication 
is that all three approaches (rows, columns, whole matrices) come out right. 


The Matrix P;; for a Row Exchange 


To subtract row j from row i we use £;;. To exchange or “permute” those rows we use 
another matrix P;; (a permutation matrix). A row exchange is needed when zero is in the 
pivot position. Lower down, that pivot column may contain a nonzero. By exchanging the 
two rows, we have a pivot and elimination goes forward. 

What matrix P23 exchanges row 2 with row 3? We can find it by exchanging rows of 
the identity matrix I: 


100 
Permutation matrix Pz = |0 0 1 
0 1 0 


This is a row exchange matrix. Multiplying by P23 exchanges components 2 and 3 of any 
column vector. Therefore it also exchanges rows 2 and 3 of any matrix: 


1 0 0 1 l 1 0 0||2 4 1 2 4 1 
00 I 3/=|5 and 00 1 00 3;=/0 6 5 
0 1 0; |5 3 0O 1 OO; [0 6 5 0 0 3 


On the right, P23 is doing what it was created for. With zero in the second pivot position 
and “6” below it, the exchange puts 6 into the pivot. 
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Matrices act. They don’t just sit there. We will soon meet other permutation matrices, 
which can change the order of several rows. Rows 1, 2, 3 can be moved to 3, 1, 2. Our P23 
is one particular permutation matrix—it exchanges rows 2 and 3. 


To exchange equations 1 and 3 multiply by =P 3 = E 1 a |. 
Usually row exchanges are not required. The odds are good that elimination uses only 
the £;;. But the P;; are ready if needed, to move a pivot up to the diagonal. 


The Augmented Matrix 


This book eventually goes far beyond elimination. Matrices have all kinds of practical 
applications, in which they are multiplied. Our best starting point was a square F times a 
square A, because we met this in elimination—and we know what answer to expect for EA. 
The next step is to allow a rectangular matrix. It still comes from our original equations, 
but now it includes the right side b. 

Key idea: Elimination does the same row operations to A and to b. We can include 
b as an extra column and follow it through elimination. The matrix A is enlarged or 
“augmented” by the extra column b: 


Augmented matrix | 


Elimination acts on whole rows of this matrix. The left side and right side are both mul- 
tiplied by Æ, to subtract 2 times equation 1 from equation 2. With [ A b] those steps 
happen together; 


1 0 ọQ 2 4 -2 2 2 4-2 2 
—2 1 O 4 9-3 8;=/; 0 1 1 4 
0 0 1|]-2 —3 7 10 —2 -3 7 10 


The new second row contains 0, 1, 1,4. The new second equation is x2 + x3 = 4. Matrix 
multiplication works by rows and at the same time by columns: 


ROWS Each row of E actson[A b]togivearowof[EA Eb]. 
COLUMNS E acts on each column of [A b] to give a column of [EA Eb]. 


Notice again that word “acts.” This is essential. Matrices do something! The matrix A 
acts on x to produce b. The matrix E operates on A to give EA. The whole process of 
elimination is a sequence of row operations, alias matrix multiplications. A goes to E21 A 
which goes to E31 E21 A. Finally E32 E31 E21 A is a triangular matrix. 
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The right side is included in the augmented matrix. The end result is a triangular system 
of equations. We stop for exercises on multiplication by £, before writing down the rules 
for all matrix multiplications (including block multiplication). 


a REVIEW OF THE KEY IDEAS =" 


1. Ax = x; times column 1 +----+ x, times column n. And (Ax); = via ajjpXj. 
2. Identity matrix = 7, elimination matrix = E;; using €;;, exchange matrix = Pj;. 


3. Multiplying Ax = b by Ez; subtracts a multiple 22; of equation 1 from equation 2. 
The number —€; is the (2, 1) entry of the elimination matrix E21. 


4, For the augmented matrix [ A b |, that elimination step gives [ E214 E16 |. 


5. When A multiplies any matrix B, it multiplies each column of B separately. 


= WORKED EXAMPLES = 


2.3 A What 3 by 3 matrix E>, subtracts 4 times row 1 from row 2? What matrix P32 
exchanges row 2 and row 3? If you multiply A on the right instead of the left, describe the 
results AF2; and AP32. 


Solution By doing those operations on the identity matrix J , we find 


1 0 0 i 0 0 
E21 = —4 1 0 and P32 = 00 1 
0 0 1 01 0 


Multiplying by E21 on the right side will subtract 4 times column 2 from column 1. 
Multiplying by P32 on the right will exchange columns 2 and 3. 


2.3B Write down the augmented matrix [A b] with an extra column: 


x+2y +2z =] 
4x + 8y +9z =3 
3y +2z=1 


Apply E21 and then P32 to reach a triangular system. Solve by back substitution. What 
combined matrix P32 E21 will do both steps at once? 
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Solution £2; removes the 4 in column 1. But zero appears in column 2: 
122 1 1 2 2 

[A b=] 4 8 9 3 and Enla bJ=} 0 0 1 -!1 
03 2 1 0 3 2 


Now P32 exchanges rows 2 and 3. Back substitution produces z then y and x. 


12 2 1 x 1 
P32 E [A b] = 0 3 2 1 and y = 1 
0 0 1 -i Z -l 
For the matrix P32 E21 that does both steps at once, apply P32 to E>. 
One matrix 1 0 0 
Both steps P32 E23 = exchange the rows of E2] = > ? r 


2.3 C Multiply these matrices in two ways. First, rows of A times columns of B. 
Second, columns of A times rows of B. That unusual way produces two matrices that 
add to AB. How many separate ordinary multiplications are needed? 


3 4 > 4 10 16 
Both ways AB=|1 5 É i =| 7 9 
2 0 4 8 


Solution Rows of A times columns of B are dot products of vectors: 


(row 1) - (column 1) = [3 4] H =10 is the (1, 1) entry of AB 


(row 2)+ (column 1) = I! 3] He 7 isthe (2,1) entry of AB 


We need 6 dot products, 2 multiplications each, 12 in all (3-2-2). The same AB comes 
from columns of A times rows of B. A column times a row is a matrix. 


3] [2 4] 4] [1 1] 6 12 4 4 
AB=|1 +|5 =|2 4]4+|5 5 
2 0 4 8 0 0 
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Problem Set 2.3 


Problems 1-15 are about elimination matrices. 


1 Write down the 3 by 3 matrices that produce these elimination steps: 


(a) E2; subtracts 5 times row 1 from row 2. 
(b) E32 subtracts —7 times row 2 from row 3. 
(c) P exchanges rows 1 and 2, then rows 2 and 3. 
2 In Problem 1, applying E21 and then £32 to b = (1, 0,0) gives E32 E21b = 


Applying E32 before E21 gives E21 E32b = . When £32 comes first, 
TOW feels no effect from row 


3 Which three matrices E21, E31, E32 put A into triangular form U? 


1 1 0 
A= 4 6 l and Ez2E31 E21 A = U. 
-2 2 Ọ 


Multiply those £’s to get one matrix M that does elimination: MA = U. 


4 Include b = (1,0, 0) as a fourth column in Problem 3 to produce [A 6]. Carry out 
the elimination steps on this augmented matrix to solve Ax = b. 


5 Suppose a33 = 7 and the third pivot is 5. If you change a33 to 11, the third pivot is 
. If you change a33 to , there is no third pivot. 


6 If every column of A is a multiple of (1,1,1), then Ax is always a multiple of 
(1,1, 1). Do a3 by 3 example. How many pivots are produced by elimination? 


7 Suppose E subtracts 7 times row 1 from row 3. 


(a) To invert that step you should 7 times row to row 
(b) What “inverse matrix” E~! takes that reverse step (so ETIE = J)? 
(c) If the reverse step is applied first (and then £) show that EET! = 1. 
8 The determinant of M = [#"] is det M = ad — be. Subtract £ times row 1 


from row 2 to produce a new M*. Show that det M* = det M for every £. When 
£ = c/a, the product of pivots equals the determinant: (a)(d — £b) equals ad — be. 


9 (a) £2, subtracts row 1 from row 2 and then P23 exchanges rows 2 and 3. What 
matrix M = P23 E21 does both steps at once? 


(b) P23 exchanges rows 2 and 3 and then £3, subtracts row 1 from row 3. What 
matrix M = E3; P23 does both steps at once? Explain why the M’s are the 
same but the E’s are different. 
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10 


11 


12 


13 


14 


15 
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(a) What 3 by 3 matrix £13 will add row 3 to row 1? 
(b) What matrix adds row 1 to row 3 and at the same time row 3 to row 1? 
(c) What matrix adds row 1 to row 3 and then adds row 3 to row 1? 


Create a matrix that has 411 = a@22 = a33 = 1 but elimination produces two negative 
pivots without row exchanges. (The first pivot is 1.) 


Multiply these matrices: 
00 1 1 2 3;{0 0 1 1 0 90 2 3 
O 1 O0/;'4 5 64/0 1 0 -1 1 0 1 3 1 
1 0 0||7 8 9||10 0 -1 0 1 14 0 


Explain these facts. If the third column of B is all zero, the third column of EB is 
all zero (for any E). If the third row of B is all zero, the third row of EB might not 
be zero. 


This 4 by 4 matrix will need elimination matrices E2,; and E32 and E43. What are 
those matrices? 


2-1 0 0 
12-1 0 
A=! 9-7 2-1 
0 0-1 2 


Write down the 3 by 3 matrix that has aj; = 2i — 3j. This matrix has a32 = 0, but 
elimination still needs F32 to produce a zero in the 3,2 position. Which previous 
step destroys the original zero and what is E32? 


Problems 16-23 are about creating and multiplying matrices. 


16 


17 


18 


Write these ancient problems in a 2 by 2 matrix form Ax = b and solve them: 


(a) X is twice as old as Y and their ages add to 33. 
(b) (x,y) = (2, 5) and (3, 7) lie on the line y = mx + c. Find m and c. 


The parabola y = a + bx + cx? goes through the points (x, y) = (1, 4) and (2, 8) 
and (3, 14). Find and solve a matrix equation for the unknowns (a, b,c). 
Multiply these matrices in the orders EF and FE: 
1 0 0 10 0 
E=fa 1 0 F=/0 ft 0 
b 0 1 0O ci 


Also compute E? = EE and F? = FFF. You can guess F 1°., 
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19 


20 


21 
22 


23 


Multiply these row exchange matrices in the orders PQ and QP and P?: 


0 1 0 00 1 
P=/1 0 0 and Q=j;0 1 0 
0 0 1 1 0 0 


Find another non-diagonal matrix whose square is M? = T. 
(a) Suppose all columns of B are the same. Then all columns of EB are the same, 
because each one is E times 
(b) Suppose all rows of B are[1 2 4]. Show by example that all rows of EB are 
not[1 2 4]. It is true that those rows are . 
If E adds row 1 to row 2 and F adds row 2 to row 1, does EF equal FE? 


The entries of A and x are aj; and x;. So the first component of Ax is } ` aijxj = 
11X1 +--+ + QinXn. If E21 subtracts row 1 from row 2, write a formula for 

(a) the third component of Ax 

(b) the (2, 1) entry of E214 

(c) the (2, 1) entry of E21 (E214) 

(d) the first component of E21 Ax. 
The elimination matrix E = [3 o] subtracts 2 times row 1 of A from row 2 of A. 


The result is EA. What is the effect of E(EA)? In the opposite order AE, we are 
subtracting 2 times of A from . (Do examples.) 


Problems 24—27 include the column 5 in the augmented matrix [A b]. 


24 


25 


26 


Apply elimination to the 2 by 3 augmented matrix [A b]. What is the triangular 
system Ux = c? What is the solution x? 


_ 2 3 Xi [ 1 
s= bal loh 
Apply elimination to the 3 by 4 augmented matrix [A b]. How do you know this 
system has no solution? Change the last number 6 so there is a solution. 


1 2 3 x 1 
Ax =|2 3 4 y|=]|2 
3 7 Z 6 


n 


The equations Ax = b and Ax* = b* have the same matrix A. What double 
augmented matrix should you use in elimination to solve both equations at once? 


Solve both of these equations by working on a 2 by 4 matrix: 


2 Abl] e f Jeli 
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27 Choose the numbers a, b, c, d in this augmented matrix so that there is (a) no solution 
(b) infinitely many solutions. 


123 a 
[A b | ={0 4 5 6 
00 de 
Which of the numbers a, b, c, or d have no effect on the solvability? 
28 If AB = I and BC = I use the associative law to prove A = C. 


Challenge Problems 


29 Find the triangular matrix E that reduces “Pascal’s matrix’ to a smaller Pascal: 


0 0 
ws 1 0 
Eliminate column 1 E 11 
1 2 


= OC CO 


1 0 l 
1 1 0 
1 2 0 
1 3 0 
Which matrix M (multiplying several £’s) reduces Pascal all the way to I? 
Pascal’s triangular matrix is exceptional, all of its multipliers are 2;; = 1. 


30 Write M = [34] asa product of many factors A = [19] and B = [4:1]. 


(a) What matrix E subtracts row 1 from row 2 to make row 2 of EM smaller? 
(b) What matrix F subtracts row 2 of EM from row 1 to reduce row 1 of FEM? 
(c) Continue E’s and F’s until (many £’s and F’s) times (M) is (A or B). 


(d) E and F are the inverses of A and B! Moving all E’s and F’s to the right side 
will give you the desired result M = product of A’s and B’s. 


This is possible for integer matrices M = [#8] > 0 that have ad — be = 1. 


31 Find elimination matrices E2, then £32 then E43 to change K into U: 


2-1 0 0 2 -l 0 0 
-1 2-1 Of _|0 3/2 -1 0 
Es E32 En | o _1 2 _1l=lo o 4/3 —1 
0 0 -1 2 0 0 0 5/4 


Apply those three steps to the identity matrix J, to multiply E 43 E 32 F 21. 
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2.4 Rules for Matrix Operations 


I will start with basic facts. A matrix is a rectangular array of numbers or “entries”. When 
A has m rows and z columns, it is an “m by n” matrix. Matrices can be added if their 
shapes are the same. They can be multiplied by any constant c. Here are examples of 
A+ B and 24A, for 3 by 2 matrices: 


1 2 2 2 3 4 1 2 2 4 
3 4/+14 4);=17 8 and 213 4/=1{/6 8 
0 0 9 9 9 9 0 0 0 0 


Matrices are added exactly as vectors are—one entry at a time. We could even regard a 
column vector as a matrix with only one column (so n = 1). The matrix —A comes from 
multiplication by c = —1 (reversing all the signs). Adding A to —A leaves the zero matrix, 
with all entries zero. All this is only common sense. 

The entry in row i and column j is called a;; or A(i, j). The n entries along the first 
TOW are 411, 412,..-,@in. The lower left entry in the matrix is amı and the lower right is 
amn. The row number i goes from 1 to m. The column number j goes from 1 ton. 


Matrix addition is easy. The serious question is matrix multiplication. When can we 
multiply A times B, and what is the product AB? We cannot multiply when A and B are 
3 by 2. They don’t pass the following test: 


To multiply AB: If A has n columns, B must have n rows. 


When A is 3 by 2, the matrix B can be 2 by 1 (a vector) or 2 by 2 (square) or 2 by 20. 
Every column of B is multiplied by A. I will begin matrix multiplication the dot product 
way, and then return to this column way: A times columns of B. The most important rule 
is that AB times C equals A times BC. A Challenge Problem will prove this. 

Suppose A is m by n and B isn by p. We can multiply. The product AB is m by p. 


(m x n)(n x p) = (m x p) oimn | | n rows |= m rows | 


n columns | | p columns p columns 


A row times a column is an extreme case. Then 1 by n multiplies n by 1. The result is 1 
by 1. That single number is the “dot product”. 

In every case AB is filled with dot products. For the top corner, the (1, 1) entry of AB 
is (row 1 of A) - (column 1 of B). To multiply matrices, take the dot product of each row 
of A with each column of B. 


i of AB is (row i of A) » (column j of B); 00 


Figure 2.8 picks out the second row (i = 2) of a 4 by 5 matrix A. It picks out the third 
column (j = 3) of a 5 by 6 matrix B. Their dot product goes into row 2 and column 3 
of AB. The matrix AB has as many rows as A (4 rows), and as many columns as B. 
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k*k Ox * Ok 
* 
x * (AB); * * * 
* 
* * 
Ais 4 by 5 BisSby6 AB is 4 by 6 


Figure 2.8: Here i = 2 and j = 3. Then (AB); is (row 2) + (column3) = Dao bz3. 


Example 1 Square matrices can be multiplied if and only if they have the same size: 


1 1||2 2] _ 15 6 
2 —1||3 4) |1 OF 
The first dot product is 1-2 + 1 -3 = 5. Three more dot products give 6, 1, and 0. Each 
dot product requires two multiplications—thus eight in all. 
If A and B aren by n, so is AB. It contains n? dot products, row of A times column of 


B. Each dot product needs n muitiplications, so the computation of AB uses n? separate 
multiplications. For n = 100 we multiply a million times. For n = 2 we have n? = 8. 


Mathematicians thought until recently that AB absolutely needed 2? = 8 multiplica- 
tions. Then somebody found a way to do it with 7 (and extra additions). By breaking n by 
n matrices into 2 by 2 blocks, this idea also reduced the count for large matrices. Instead of 
n? it went below n2:8, and the exponent keeps falling.! The best at this moment is n?->76. 
But the algorithm is so awkward that scientific computing is done the regular way: n? dot 
products in AB, and n multiplications for each one. 


Example 2 Suppose A is a row vector (1 by 3) and B is a column vector (3 by 1). Then 
AB is 1 by 1 (only one entry, the dot product). On the other hand B times A (a column 
times a row) is a full 3 by 3 matrix. This multiplication is allowed! 


. ` 0 00 0 
Column times row 

1}[1 2 3]=|1 2 3 

(nx1)Axn) = (nxn) 2 246 


A row times a column is an “inner” product—that is another name for dot product. A col- 
umn times a row is an “outer” product. These are extreme cases of matrix multiplication. 


Rows and Columns of AB 


In the big picture, A multiplies each column of B. The result is a column of AB. In that 
column, we are combining the columns of A. Each column of AB is a combination of 


'Maybe 2.376 will drop to 2. No other number looks special, but no change for 10 years. 
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the columns of A. That is the column picture of matrix multiplication: 
Matrix A times column of B8 A[ b,---b, | = [ Abi --- Abp]. 


The row picture is reversed. Each row of A multiplies the whole matrix B. The result is a 
row of AB. It is a combination of the rows of B: 


1 2 3 
Row times matrix [rowiof A]|4 5 6 | =[rowi of AB ]. 
7 8 9 


We see row operations in elimination (E times A). We see columns in A times x. The 
“row-column picture” has the dot products of rows with columns. Believe it or not, 
there is also a column-row picture. Not everybody knows that columns 1,...,n of A 
multiply rows 1,...,” of B and add up to the same answer AB. Worked Example 2.3 C 
had numbers for n = 2. Example 3 will show how to multiply AB using columns times 
rows. 


The Laws for Matrix Operations 


May I put on record six laws that matrices do obey, while emphasizing an equation they 
don’t obey? The matrices can be square or rectangular, and the laws involving A + B are 
all simple and all obeyed. Here are three addition laws: 


A+B=B+A (commutative law) 
c(A+ B)=cA+cB (distributive law) 
A+(B+C)=(A+B)+C (associative law). 


Three more laws hold for multiplication, but AB = BA is not one of them: 


4 A. (the commutative “law” is usually broken) 
C(A + ‘B)= -CA + CB (distributive law from the left) 
a + B)C = AC + BC (distributive law from the right) 
X . (associative law for ABC) (parentheses not needed). 


When A and B are not square, AB is a different size from BA. These matrices can’t be 
equal—even if both multiplications are allowed. For square matrices, almost any example 
shows that AB is different from BA: 


wek IG J A mm rf JE J-E 3] 


It is true that AJ = IA. All square matrices commute with J and also with c7. Only these 
matrices c7 commute with all other matrices. 

The law A(B + C) = AB + AC is proved a column at a time. Start with A(b +c) = 
Ab + Ac for the first column. That is the key to everything—linearity. Say no more. 


The law A(BC) = (AB)C means that you can multiply BC first or else AB first. 
The direct proof is sort of awkward (Problem 37) but this law is extremely useful. 
We highlighted it above; it is the key to the way we multiply matrices. 
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Look at the special case when A = B = C = square matrix. Then (A times A?) is 
equal to (A times A). The product in either order is A3. The matrix powers A” follow the 
same rules as numbers: 


AP = AAAs A (p factors) O(AP)(A9) = Arte - (APs AP, 


Those are the ordinary laws for exponents. A? times A* is A’ (seven factors). A? to 
the fourth power is A!? (twelve A’s). When p and q are zero or negative these rules still 
hold, provided A has a “—1 power”—which is the inverse matrix A~!. Then 4? = I is the 
identity matrix (no factors). 

For a number, a~! is 1/a. For a matrix, the inverse is written A7!. (It is never I/A, 
except this is allowed in MATLAB.) Every number has an inverse except a = 0. To decide 
when A has an inverse is a central problem in linear algebra. Section 2.5 will start on the 
answer. This section is a Bill of Rights for matrices, to say when A and B can be multiplied 
and how. 


Block Matrices and Block Multiplication 


We have to say one more thing about matrices. They can be cut into blocks (which are 
smaller matrices). This often happens naturally. Here is a 4 by 6 matrix broken into blocks 
of size 2 by 2—in this example each block is just Z: 

|Z isi 

CJZ fo ry 


If B is also 4 by 6 and the block sizes match, you can add A + B a block at a time. 

We have seen block matrices before. The right side vector b was placed next to A in 
the “augmented matrix”. Then [A b] has two blocks of different sizes. Multiplying by 
an elimination matrix gave [EA Eb]. No problem to multiply blocks times blocks, when 
their shapes permit. 


4 by 6 matrix 
2 by 2 blocks 


10] 1 
1/0 11] 90 
0; 1 0j] 1 
0 1/10 110 


Block: multipli tion If the cuts between: columns oF A match thé cuts: between TOWS 
of B, t Jliçation of AB: sallo owed: goi 


[Au An Bu e AiB + ABu gi Doga e o 
Láz An Buy oe A21 Bi, + Ao2Bay +: Sener i 


This equation is the same as if the blocks were numbers (which are 1 by 1 blocks). We are 
careful to keep A’s in front of B’s, because BA can be different. 


Main point When matrices split into blocks, it is often simpler to see how they act. The 
block matrix of J’s above is much clearer than the original 4 by 6 matrix A. 
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Example 3 (Important special case) Let the blocks of A be its n columns. Let the 
blocks of B be its n rows. Then block multiplication AB adds up columns times rows: 


Columns | | — b — 
times ai vee an : = a,b, ++ anban . (2) 
rows | | — b, — 


This is another way to multiply matrices. Compare it with the usual rows times columns. 
Row 1 of A times column 1 of B gave the (1, 1) entry in AB. Now column 1 of A times 
row 1 of B gives a full matrix—not just a single number. Look at this exampie: 


a JE a = fle afe e 


Column 1 times row 1 3 2 4 0 
= |; alefi ol: (3) 


+ Column 2 times row 2 


We stop there so you can see columns multiplying rows. If a 2 by 1 matrix (a column) 
multiplies a 1 by 2 matrix (a row), the result is 2 by 2. That is what we found. Dot 
products are inner products and these are outer products. In the top left comer the answer 
is 3 + 4 = 7. This agrees with the row-column dot product of (1, 4) with (3, 1). 


Summary The usual way, rows times columns, gives four dot products (8 multiplications). 
The new way, columns times rows, gives two full matrices (the same 8 multiplications). 
The 8 multiplications, and the 4 additions, are just executed in a different order. 


Example 4 (Elimination by blocks) Suppose the first column of A contains 1, 3, 4. 
To change 3 and 4 to O and 0, multiply the pivot row by 3 and 4 and subtract. Those 
row operations are really multiplications by elimination matrices E2; and E31: 


1 0 0 1 0O Ọ 
One at a time Eo, = |—3 I 0 and E31 = 0 1 0 
0 O0 1 —4 0 1 


The “block idea” is to do both eliminations with one matrix £. That matrix clears out the 
whole first column of A below the pivot a = 1: 


1 0 9 1x x 1 x x 
E=|/-3 1 0 multiplies 3 x x to give EA=|0 x x 
—4 0 1 4x x O x x 


Using inverses from 2.5, a block matrix Æ can do elimination on a whole (block) column 
of A. Suppose A has four blocks A, B,C, D. Watch how E multiplies A by blocks: 


Block I 0 AIB] _ [A B 4 
sinnaton [oaa trl eo o torear) ® 


Elimination multiplies the first row [A B] by CAT! (previously c/a). It subtracts from 
C to get a zero block in the first column. It subtracts from D to get S = D—CA™B. 
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This is ordinary elimination, a column at a time—written in blocks. That final block S is 
D — CA™! B, just like d — cb/a. This is called the Schur complement. 


= REVIEW OF THE KEY IDEAS = 


. The (i, j) entry of AB is (row i of A)- (column j of B). 

An m by n matrix times an n by p matrix uses mnp separate multiplications. 
. A times BC equals AB times C (surprisingly important). 

. AB is also the sum of these matrices: (column j of A) times (row j of B). 


. Block multiplication is allowed when the block shapes match correctly. 


A on fF V NŅN e 


. Block elimination produces the Schur complement D — CAT! B. 


= WORKED EXAMPLES #8 


2.4A Put yourself in the position of the author! I want to show you matrix multiplica- 
tions that are special, but mostly I am stuck with small matrices. There is one terrific fam- 
ily of Pascal matrices, and they come in all sizes, and above all they have real meaning. 
I think 4 by 4 is a good size to show some of their amazing patterns. 

Here is the lower triangular Pascal matrix L. Its entries come from “Pascal’s triangle”. 
I will multiply L times the ones vector, and the powers vector: 


1 1 1 1 1 1 
Pascal 1 1 1{_ |2 1 1 x} | 1+x 
matrix |1 2 1  |j1]° 414 12 1 x? (+x)? 

13 3 1]]1 8 13 3 1J[x (1+x) 


Each row of L leads to the next row: Add an entry to the one on its left to get the entry 
below. In symbols €; ; +£; ;-1 = €:41 j. The numbers after 1, 3,3, 1 would be 1, 4, 6, 4, 1. 
Pascal lived in the 1600’s, long before matrices, but his triangle fits perfectly into L. 

Multiplying by ones is the same as adding up each row, to get powers of 2. By writing 
out L times powers of x, you see the entries of L as the “binomial coefficients” that are so 
essential to gamblers: 


14 2x +1x? =(14 x) 1+3x 43x74 1x? = (1 + x)? 


The number “3” counts the ways to get Heads once and Tails twice in three coin flips: 
HTT and THT and TTH. The other “3” counts the ways to get Heads twice: HHT and 
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HTH and THH. Those are examples of “i choose j” = the number of ways to get j heads 
in i coin flips. That number is exactly €;;, if we start counting rows and columns of L at 
i = Q and j = 0 (and remember 0! = 1): 


l = i = į choose j = i! 4 -4 24 = 6 
DY PE OO T= Gy 2} 221 De ~ 


There are six ways to choose two aces out of four aces. We will see Pascal’s triangle and 
these matrices again. Here are the questions I want to ask now: 


1. What is H = L?? This is the “hypercube matrix”. 
2. Multiply H times ones and powers. 


3. The last row of H is 8,12,6,1. A cube has 8 corners, 12 edges, 6 faces, 1 box. 
What would the next row of H tell about a hypercube in 4D? 


Solution Multiply L times L to get the hypercube matrix H = L?: 


1 1 1 
1 1 1 J 2 1 
1 2 1 1 2 1 —]4 4 1 =H. 
13 3 1 13 3 1 8 12 6 1 
Now multiply H times the vectors of ones and powers: 
1 l 1 l l 1 
2 1 1{/ 13 2 1 x |_| 2+x 
4 4 1 i}; | 9 4 4 1 x? (2 +x)? 
8 12 1 | 27 8 12 6 1 x? (2 +x% 


If x = 1 we get the powers of 3. If x = 0 we get powers of 2. When L produces powers 
of 1 + x, applying L again produces powers of 2 + x. 

How do the rows of H count corners and edges and faces of a cube? A square in 
2D has 4 corners, 4 edges, 1 face. Add one dimension at a time: 


Connect two squares to get a 3D cube. Connect two cubes to get a 4D hypercube. 


The cube has 8 corners and 12 edges: 4 edges in each square and 4 between the squares. 
The cube has 6 faces: 1 in each square and 4 faces between the squares. This row 8, 12,6, 1 
will lead to the next row 16, 32, 24, 8, 1. The rule is 2h; ; + Ai j-1 = hi4i ;. 

Can you see this in four dimensions? The hypercube has 16 comers, no problem. It 
has 12 edges from one cube, 12 from the other cube, 8 that connect corners of those cubes: 
total 32 edges. It has 6 faces from each separate cube and 12 more from connecting pairs 
of edges: total 2 x 6+ 12 = 24 faces. It has one box from each cube and 6 more from 
connecting pairs of faces: total 8 boxes. And finally 1 hypercube. 
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2.4B For these matrices, when does AB = BA? When does BC = CB? When does 
A times BC equal AB times C? Give the conditions on their entries p, q,r, z: 


_|p 0 ji i _)0 z 
asg] elo i] eslo c 
If p,q.r,1,z are 4 by 4 blocks instead of numbers, do the answers change? 


Solution First of all, A times BC always equals AB times C. Parentheses are not 
needed in A(BC) = (AB)C = ABC. But we must keep the matrices in this order: 


_|P P _{pt+qr 
Usually AB # BA AB= z q ‘| BA = | q |: 
0 z 0 z 
By chance BC = CB BC = [o l CB = È al; 


B and C happen to commute. Part of the explanation is that the diagonal of B is J, which 
commutes with all 2 by 2 matrices. When p,q,r,z are 4 by 4 blocks and 1 changes to J, 
all these products remain correct. So the answers are the same. 


2.4C A directed graph starts with n nodes. The n by n adjacency matrix has a;; = 1 
when an edge leaves node i and enters node j; if no edge then a;; = 0. 


node 1 to node 2 


1 i 


node 1 to node 1 2 A= i 0 


| = adjacency matrix 

node 2 to node 1 
The i, j entry of A? is S dikākj. This is Qj aj eee + Vintnj- Why does that sum 
count the two-step paths from i to any node to j? The i, j entry of A* counts k-step paths: 


1 1)? _[2 1] Count paths Lto2tol,itoltol ltolto2 
1 Of -|I 1 with two edges 2toltol 2 to 1 to2 


List all of the 3-step paths between each pair of nodes and compare with A?. 


Solution The number a;,a;; will be “1” if there is an edge from node i to k and an 
edge from k to j. This is a 2-step path. The number a;;,a,; will be “0” if either of those 
edges (i tok, k to 7) is missing. So the sum of a;,.a;,; is the number of 2-step paths leaving 
i and entering j. Matrix multiplication is just right for this count. 
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The 3-step paths are counted by A?; we look at paths to node 2: 


B= 3 2 counts the paths + Ltoltolto2,1lto2tolto2 
“12 4 with three steps 2toltolto2 


These A* contain the Fibonacci numbers 0, 1, 1,2,3,5, 8, 13,... coming in Section 6.2. 
Multiplying A by A* involves Fibonacci’s rule Fx42 = Fai + Fy (asin 13 = 8 + 5): 


1 oi][ Fear Fe Fk+2 Fei k+1 
A)(A*) = + =| t = Aft, 
(A) f o i Fea] [Fest Fi 
There are 13 six-step paths from node | to node 1, but I can’t find them all. 
AF also counts words. A path like 1 to 1 to 2 to 1 corresponds to the word aaba. The 


letter b can’t repeat because there is no edge from 2 to 2. The i, j entry of A* counts the 
words of length k + 1 that start with the ith letter and end with the jth. 


Problem Set 2.4 


Problems 1-16 are about the laws of matrix multiplication. 


1 A is 3 by 5, B is 5 by 3, C is 5 by 1, and D is 3 by 1. All entries are 1. Which of 
these matrix operations are allowed, and what are the results? 


BA AB ABD DBA A(B +C). 
2 What rows or columns or matrices do you multiply to find 


(a) the third column of AB? 
(b) the first row of AB? 
(c) the entry in row 3, column 4 of AB? - 


(d) the entry in row 1, column 1 of CDE? 


3 Add AB to AC and compare with A(B + C): 


1 5 0 2 3 1 
a=; 3] and a=, | and c= ol: 


4 In Problem 3, multiply A times BC. Then multiply AB times C. 
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Compute A? and A?. Make a prediction for A” and A”: 


1 b 2 2 
a=|) | and 4=|3 ol: 


Show that (A + BY is different from A? + 2AB + B?, when 


1 2 1 0 
a=|) | and s=; o|- 


Write down the correct rule for (A + B)(A + B) = A? + + B?. 
True or false. Give a specific example when false: 


(a) If columns 1 and 3 of B are the same, so are columns 1 and 3 of AB. 
(b) If rows 1 and 3 of B are the same, so are rows 1 and 3 of AB. 

(c) If rows 1 and 3 of A are the same, so are rows 1 and 3 of ABC. 

(d) (AB)? = A*B?. 


How is each row of DA and EA related to the rows of A, when 


3 0 0 1 a b 
= — = 9 
D È | and E lo 1 and A f At 


How is each column of AD and AE related to the columns of A? 


Row 1 of A is added to row 2. This gives EA below. Then column 1 of EA is added 
to column 2 to produce (EA)F: 


1 Ojja b a b 
ra=]; le allaso bta] 
1 1 b 
and (EAF = (EA) |o elase aratta) 


(a) Do those steps in the opposite order. First add column 1 of A to column 2 
by AF, then add row 1 of AF to row 2 by E(AF). 
(b) Compare with (EA)F. What law is obeyed by matrix multiplication? 


Row 1 of A is again added to row 2 to produce EA. Then F adds row 2 of EA to 
row 1. The result is F(E.A): 


{1 1l a b | _|2a+e 2b+d 
FEA) = | ine stal ar: bea | 


(a) Do those steps in the opposite order: first add row 2 to row 1 by FA, then add 
row 1 of FA to row 2. 


(b) What law is or is not obeyed by matrix multiplication? 
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11 (3 by 3 matrices) Choose the only B so that for every matrix A 
(a) BA=4A 
(b) BA=4B 
(c) BA has rows 1 and 3 of A reversed and row 2 unchanged ` 


(d) Ail rows of BA are the same as row 1 of A. 


12 Suppose AB = BA and AC = CA for these two particular matrices B and C: 


a b . 1 O 0 1 
a=|é A commutes with s=[5 ol and c=[5 ol 


Prove thata = d and b = c = 0. Then A is a multiple of 7. The only matrices that 
commute with B and C and all other 2 by 2 matrices are A = multiple of J. 


13 Which of the following matrices are guaranteed to equal (A — B)?: 4? — B?, 
(B — A)*, A? —2AB + B?, A(A— B) — B(A — B), A? — AB — BA + B?? 


14 True or false: 
(a) If A? is defined then A is necessarily square. 
(b) If AB and BA are defined then A and B are square. 


(c) If AB and BA are defined then AB and BA are square. 
(d) If AB = B then A = I. 


15 TfAism byn, how many separate multiplications are involved when 


(a) A multiplies a vector x with n components? 
(b) A multiplies an n by p matrix B? 
(c) A multiplies itself to produce A? ? Here m = n. 


16 For A= [2-4] and B = [} $ 4], compute these answers and nothing more: 


(a) column2 of AB ' 
(b) row 2 of AB 

(c) row 2 of AA = A? 
(d) row 2 of AAA = A?. 


Problems 17-19 use a;; for the entry in row i, column j of A. 
17 Write down the 3 by 3 matrix A whose entries are 


(a) aj; = minimum of i and j 
(b) ay = (-1)'*/ 
(c) ayy = i/j. 
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18 What words would you use to describe each of these classes of matrices? Give a 3 
by 3 example in each class. Which matrix belongs to all four classes? 


(a) ay =Oifi AJ 
(b) aj = Oifi < j 
(c) aij = Aji 
(d) aij = aij. 
19 The entries of A are a;;. Assuming that zeros don’t appear, what is 
(a) the first pivot? 
(b) the multiplier £3; of row 1 to be subtracted from row 3? 


(c) the new entry that replaces a32 after that subtraction? 


(d) the second pivot? 
Problems 20-24 involve powers of A. 
20 Compute A’, A?, A* and also Av, A?v, A>v, A*v for 


0 


and v= 


ooo & 
Ooo ow 
Coon © 
oS Xo 

~ N Set OX 


21 Find all the powers A’, A?,...and AB, (AB)’,... for 
5...5 1 0 
A= È 3| and B= E aN 
22 By trial and error find real nonzero 2 by 2 matrices such that 
A? =-I BC=0 DE =-—ED (notallowing DE = 0). 


23 (a) Find a nonzero matrix A for which A? = 0. 
(b) Find a matrix that has A? 4 0 but A? = 0. 


24 By experiment with n = 2 and n = 3 predict A” for these matrices: 


a=, | and =i | and 4s=[6 A 
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Problems 25-31 use column-row multiplication and block multiplication. 


25 
26 


27 


28 


29 


30 


31 


Multiply A times J using columns of A (3 by 3) times rows of T. 


Multiply AB using columns times rows: 


Show that the product of upper triangular matrices is always upper triangular: 


x x x]lx 
AB=!10 x x1{/0 
0 0 x 0 0 0 0 


Proof using dot products (Row times column) (Row 2 of A). (column 1 of B)= 0. 
Which other dot products give zeros? 


Proof using full matrices (Column times row) Draw x’s and 0’s in (column 2 of A) 
times (row 2 of B). Also show (column 3 of A) times (row 3 of B). 


Draw the cuts in A (2 by 3) and B (3 by 4) and AB to show how each of the four 
multiplication rules is really a block multiplication: 


(1) Matrix A times columns of B. Columns of AB 

(2) Rows of A times the matrix B. Rows of AB 

(3) Rows of A times columns of B. Inner products (numbers in AB) 

(4) Columns of A times rows of B. Outer products (matrices add to AB) 


Which matrices F2; and £3, produce zeros in the (2, 1) and (3, 1) positions of E21 A 
and E314? 


2 1 0 
A=|-2 0 1 
8 5 3 


Find the single matrix E= E31 £2; that produces both zeros at once. Multiply EA. 


Block multiplication says that column 1 is eliminated by 


EA=|_ oi Ji pl=lo D -objal 


In Problem 29, what are c and D and what is D — cb /a? 


With i? = —1, the product of (A +i B) and (x +i y) ìs Ax +iBx +iAy—By. Use 
blocks to separate the real part without į from the imaginary part that multiplies i: 


A —B||x] _ | Ax—By | real part 
? Pillip] ? imaginary part 
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34 
35 


36 
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(Very important) Suppose you solve Ax = b for three special right sides b: 


l 0 0 
Ax; = | 0 and Ax,=|1 and Ax3= | 0 
0 0 1 


If the three solutions x1, x2, x3 are the columns of a matrix X, what is A times X? 


If the three solutions in Question 32 are x, = (1,1,1) and x2 = (0,1,1) and 
x3 = (0,0, 1), solve Ax = b when b = (3, 5,8). Challenge problem: What is A? 
Find all matrices A = [25 | that satisfy A[ 11] =[1}]A. 


Suppose a “circle graph” has 4 nodes connected (in both directions) by edges around 
a circle. What is its adjacency matrix from Worked Example 2.4 C? What is A*? 
Find all the 2-step paths (or 3-letter words) predicted by A?. 


Chailenge Problems 


Practical question Suppose A is m by n, B is n by p, and C is p by q. Then 
the multiplication count for (AB)C is mnp + mpq. The same answer comes from 
A times BC with mng + npq separate multiplications. Notice npq for BC. 

(a) If A is 2 by 4, B is 4 by 7, and C is 7 by 10, do you prefer (AB)C or A(BC)? 

(b) With N-component vectors, would you choose (u™v)w! or u™(vw")? 

(c) Divide by mnpq to show that (AB)C is faster whenn7! + 7! < m! + p~}. 


To prove that (AB)C = A(BC), use the column vectors b1,..., bn of B. First 
suppose that C has only one column c with entries c1,..., Cn: 


AB has columns Ab,,..., Ab, and then (AB)e equals c1 Abı + +--+ Cp ADn. 
Bc has one column cib +---+c,b, and then A(Bc) equals A(c1b1 +--+: +¢,5,). 


Linearity gives equality of those two sums. This proves (A B)c = A(Bc). The same 
is true for all other of C. Therefore (AB)C = A(BC). Apply to inverses: 


If BA = I and AC = J, prove that the left-inverse B equals the right-inverse C. 
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2.5 Inverse Matrices 


Suppose A is a square matrix. We look for an “inverse matrix” AT! of the same size, such 
that AT! times A equals I. Whatever A does, A~! undoes. Their product is the identity 
matrix—which does nothing to a vector, so A~1Ax = x. But A`! might not exist. 

What a matrix mostly does is to multiply a vector x. Multiplying Ax = b by A`! 
gives A7'Ax = A7'b. This is x = Ab. The product A7!A is like multiplying by 
a number and then dividing by that number. A number has an inverse if it is not zero— 
matrices are more complicated and more interesting. The matrix A7! is called “A inverse.” 


ATASI and AAT =I 


Not ail matrices have inverses. This is the first question we ask about a square matrix: 
Is A invertible? We don’t mean that we immediately calculate A~!. In most problems 
we never compute it! Here are six “notes” about A~}. 


Note 1 The inverse exists if and only if elimination produces n pivots (row exchanges 
are allowed). Elimination solves Ax = b without explicitly using the matrix A7!. 


Note 2 The matrix A cannot have two different inverses. Suppose BA = I and also 
AC = I. Then B = C, according to this “proof by parentheses”: 


B(AC) = (BA)C gives BI=IC or BH=C. (2) 


This shows that a left-inverse B (multiplying from the left) and a right-inverse C (multi- 
plying A from the right to give AC = J) must be the same matrix. 


Note 3 If A is invertible, the one and only solution to Ax = b is x = A7'b: 


Note 4 (important) Suppose there is a nonzero vector x such that Ax = 0. Then A 
cannot have an inverse. No matrix can bring 0 back to x. 


If A is invertible, then Ax = 0 can only have the zero solution x = A~'0 = 0. 


Note5 <A 2 by 2 matrix is invertible if and only if ad — bc is not zero: 


-1 
a b 1 d —b 
2 by 2 Inverse: f A =z be s Al (3) 


This number ad —bc is the determinant of A. A matrix is invertible if its determinant is not 
zero (Chapter 5). The test for n pivots is usually decided before the determinant appears. 
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Note 6 A diagonal matrix has an inverse provided no diagonal entries are zero: 


dı 1/d; 
f A= z then A`! = -, 
dn 1/dn 


Example 1 The 2 by 2 matrix A = [}3] is not invertible. It fails the test in Note 5, 
because ad — bc equals 2 — 2 = 0. It fails the test in Note 3, because Ax = 0 when 
x = (2, —1). It fails to have two pivots as required by Note 1. 

Elimination turns the second row of this matrix A into a zero row. 


The Inverse of a Product AB 


For two nonzero numbers a and b, the sum a + b might or might not be invertible. The 
numbers a = 3 and b = —3 have inverses 5 and —$. Their sum a + b = 0 has no inverse. 
But the product ab = —9 does have an inverse, which is Ł times —4, 

For two matrices A and B, the situation is similar. It is hard to say much about the 
invertibility of A + B. But the product AB has an inverse, if and only if the two factors 
A and B are separately invertible (and the same size). The important point is that A~! and 


B! come in reverse order: 


To see why the order is reversed, multiply AB times B~!A7!. Inside that is BBT! = J: 
Inverse of AB (AB)(B-!A!) = AIAT! = AA! = 1. 


We moved parentheses to multiply BBT! first. Similarly B71 A~! times AB equals J. This 
illustrates a basic rule of mathematics: Inverses come in reverse order. It is also common 
sense: If you put on socks and then shoes, the first to be taken off are the . The same 
reverse order applies to three or more matrices: 


Reverse order | a (5) 


Example 2 Inverse of an elimination matrix. If E subtracts 5 times row 1 from row 2, 
then £T! adds 5 times row 1 to row 2: 


1 0 0 1 0 0 
E=)-5 1 0| and E™= |5 1 0 
0 0 1 001 


Multiply EE! to get the identity matrix J. Also multiply ET! E to get I. We are adding 
and subtracting the same 5 times row 1. Whether we add and then subtract (this is EET!) 
or subtract and then add (this is ET! E), we are back at the start. 
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For square matrices, an inverse on one side is automatically an inverse on the other side. 
If AB = J then automatically BA = J. In that case B is A~!. This is very useful to know 
but we are not ready to prove it. 


Example 3 Suppose F subtracts 4 times row 2 from row 3, and F~! adds it back: 


1 0 0 100 
F=/]0 1 0| and F7!=]0 1 0 
0-4 1 04 1 


Now multiply F by the matrix £ in Example 2 to find FE. Also multiply E~! times F~! 
to find (FE)~!. Notice the orders FE and E~!F7!! 


1 0 0 1 0 0 
FE=| -5 1 0| isinvertedby ETF! =| 5 1 0j. (6) 
'20'-4 l 04:1 


The result is beautiful and correct. The product FE contains “20” but its inverse doesn’t. 
E subtracts 5 times row 1 from row 2. Then F subtracts 4 times the new row 2 (changed 
by row 1) from row 3. In this order FE, row 3 feels an effect from row 1. 

In the order E~! F—!, that effect does not happen. First FT} adds 4 times row 2 to 
row 3. After that, E7! adds 5 times row 1 to row 2. There is no 20, because row 3 doesn’t 
change again. In this order E~' F71, row 3 feels no effect from row 1. 


e below the diagonal of 1s: ~ 


This special multiplication ET! FT! and E~'F-!G7! will be useful in the next sec- 
tion. We will explain it again, more completely. In this section our job is AT}, and we 
expect some serious work to compute it. Here is a way to organize that computation. 


Calculating A`! by Gauss-Jordan Elimination 


I hinted that A~! might not be explicitly needed. The equation Ax = b is solved by 
x = A7!b. But it is not necessary or efficient to compute A~! and multiply it times b. 
Elimination goes directly to x. Elimination is also the way to calculate A~!, as we now 
show. The Gauss-Jordan idea is to solve AAT! = J, finding each column of A7'. 

A multiplies the first column of A7! (call that x1) to give the first column of J (call 
that e1). This is our equation Ax; = eı = (1,0,0). There will be two more equations. 
Each of the columns x1, x2, X3 of A7! is multiplied by A to produce a column of J: 


3 columns of A`! i A [x1 x2 x3 ] 


Il 


(7) 


[er ez e3] ele 


To invert a 3 by 3 matrix A, we have to solve three systems of equations: Ax; = e; and 
Ax = €2 = (0,1,0) and Ax3 = e3 = (0,0, 1). Gauss-Jordan finds AT? this way. 
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The Gauss-Jordan method computes A`! by solving all n equations together. 
Usually the “augmented matrix” [A b] has one extra column b. Now we have three 
right sides €1,€2,€3 (when A is 3 by 3). They are the columns of J, so the augmented 
matrix is really the block matrix [A J ]. I take this chance to invert my favorite matrix K, 
with 2’s on the main diagonal and —1’s next to the 2’s: 


)1 0 O| Start Gauss-Jordan on K 

[K €i e@2 e3 |= 0 1 0 
20 0 1 
2-1 0 1 0 O 

>| 0 2-1 i 1 0 (4 row 1 + row 2) 
0-1 2 0 0 1 
2-1 0 1 0 =O 
>| 0 -i 2 1 0 

00 ¢ 3 2 1 (2 row 2 + row 3) 


We are halfway to K~!. The matrix in the first three columns is U (upper triangular). The 
pivots 2, 3, £ are on its diagonal. Gauss would finish by back substitution. The contribution 
of Jordan is to continue with elimination! He goes all the way to the “reduced echelon 
form”. Rows are added to rows above them, to produce zeros above the pivots: 


z b 2-1 0 1 0 Ọ 
ero above 3 3 3 3 
( third pivot ) O 3 032a (3 row 3 + row 2) 
P 0 0 4 1 2 4 
3 3 3 
Zero above 0 2 o 5 3 3 (5 row 2 + row 1) 
second pivot 2 4 4 2 4 
0 0 3 3 3 1 


The last Gauss-Jordan step is to divide each row by its pivot. The new pivots are 1. We 
have reached J in the first half of the matrix, because K is invertible. The three columns 
of K~ are in the second half of | I KTH]: 


(divide by 2) 1 0 0 

ws 3 ; 
(divide by 5) 0 1 0! 
(divide by 4) 


=[/ xı x2 x3ļ=|7 K5]. 


Starting from the 3 by 6 matrix [K_7 ], we ended with [Z K1]. Here is the whole 
Gauss-Jordan process on one line for any invertible matrix A: 


Gauss-Jordan 
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The elimination steps create the inverse matrix while changing A to I. For large matrices, 
we probably don’t want AT! at all. But for small matrices, it can be very worthwhile to 
know the inverse. We add three observations about this particular KT! because it is an 
important example. We introduce the words symmetric, tridiagonal, and determinant: 


1. K is symmetric across its main diagonal. So is K~!. 


2. K is tridiagonal (only three nonzero diagonals). But KT! is a dense matrix with 
no zeros. That is another reason we don’t often compute inverse matrices. The 
inverse of a band matrix is generally a dense matrix. 


3. The product of pivots is 2(2)(§) = 4, This number 4 is the determinant of K. 


3 2 iÍ 
K—! involves division by the determinant K=}=-|2 4 2]. (8) 
1 2 3 


This is why an invertible matrix cannot have a zero determinant. 


Example 4 Find A`! by Gauss-Jordan elimination starting from A = [23]. There are 
two row operations and then a division to put 1’s in the pivots: 


2 3 1 0 2 3 1 0 _ _ 
[A 11=|4 7 0 |> [o 12 i| (this is [ U L 1 ]) 
— 7 _3 
>fe a il> 3 i] (thisis[ 7 A7*]). 


That AT! involves division by the determinant ad — bc = 2-7 — 3 + 4 = 2. The code for 
X = inverse(A) can use rref, the “row reduced echelon form” from Chapter 3: 


I = eye (x); % Define the n by n identity matrix 
R= rref ([A ID; % Eliminate on the augmented matrix [A /] 
X=R(:,n+1:in+n)  % Pick A`! from the last n columns of R 


A must be invertible, or elimination cannot reduce it to J (in the left half of R). 
Gauss-Jordan shows why AT! is expensive. We must solve n equations for its n columns. 


To solve Ax = b without A~!, we deal with one column b to find one column x. 


In defense of A~!, we want to say that its cost is not z times the cost of one system 
Ax = b. Surprisingly, the cost for n columns is only multiplied by 3. This saving is 
because the n equations Ax; = e; all involve the same matrix A. Working with the right 
sides is relatively cheap, because elimination only has to be done once on A. 

The complete A~! needs n? elimination steps, where a single x needs n?/3. The next 
section calculates these costs. 
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Singular versus Invertible 


We come back to the central question. Which matrices have inverses? The start of this 
section proposed the pivot test: AT! exists exactly when A has a full set of n pivots. 
(Row exchanges are allowed.) Now we can prove that by Gauss-Jordan elimination: 


1. With n pivots, elimination solves all the equations Ax; = e;. The columns x; go 
into A~!. Then AAT! = J and A7! is at least a right-inverse. 


2. Elimination is really a sequence of multiplications by E’s and P’s and D7!: 


Left-inverse (D1..-E-- P+ EJA =]. (9) 


D7! divides by the pivots. The matrices E produce zeros below and above the pivots. 
P will exchange rows if needed (see Section 2.7). The product matrix in equation (9) is 
evidently a left-inverse. With n pivots we have reached A7!A = J. 
The right-inverse equals the left-inverse. That was Note 2 at the start of in this section. 
So a square matrix with a full set of pivots will always have a two-sided inverse. 
Reasoning in reverse will now show that A must have n pivots if AC = I. (Then we 
deduce that C is also a left-inverse and CA = I.) Here is one route to those conclusions: 


1. If A doesn’t have n pivots, elimination will lead to a zero row. 

2. Those elimination steps are taken by an invertible M. So a row of MA is zero. 

3. If AC = I had been possible, then MAC = M. The zero row of MA, times C, 
gives a zero row of M itself. 

4. An invertible matrix M can’t have a zero row! A must have n pivots if AC = I. 


That argument took four steps, but the outcome is short and important. 


If AC =I then CA=I and C= 47 | 
Example 5 If L is lower triangular with 1’s on the diagonal, so is L£. 


A triangular matrix is invertible if and only if no diagonal entries are zero. 


Here L has 1’s so LT! also has 1’s. Use the Gauss-Jordan method to construct L~!. Start 
by subtracting multiples of pivot rows from rows below. Normally this gets us halfway to 
the inverse, but for L it gets us all the way. L! appears on the right when J appears on 
the left. Notice how LT! contains 11, from 3 times 5 minus 4. 
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1 0 0 1 0 90 
Gawedorian (3 1 0 o 1 ola (e 7] 
4 5 10 0 1 
1 0 0 1 0 90 (3 times row 1 from row 2) 
>{|0 1 0-3 1 0 (4 times row 1 from row 3) 
>j0 5 1-4 0 1l (then 5 times row 2 from row 3) 
1 0 0 1 0 0 
0 1 0-3 1 Of} =[7 L7]. 
>|0 0 11-5 1 


L goes to J by a product of elimination matrices £323, £2;. So that product is LÈ, 
All pivots are 1’s (a full set). L~! is lower triangular, with the strange entry “11”. 
That 11 does not appear to spoil 3, 4, 5 in the good order Ez} E3 Ezd = L. 


= REVIEW OF THE KEY IDEAS & 


. The inverse matrix gives AAT! = J and A'A = 1. 
. Ais invertible if and only if it has n pivots (row exchanges allowed). 
. If Ax = 0 for a nonzero vector x, then A has no inverse. 


. The inverse of AB is the reverse product B7!A~!, And (ABC)! = C7! BTL 47}, 


n A | N me 


. The Gauss-Jordan method solves AAT! = I to find the n columns of A~!. The 
augmented matrix [A 7 | is row-reduced to | 7 A! }. 


= WORKED EXAMPLES =€ 


2.5A The inverse of a triangular difference matrix A is a triangular sum matrix S: 


1 0 0 0 1 0 0/1 0 0 
[A f]=] ~1 1 0 O};>] 0 1 O]1 1:0 
0 -1 1 1 0-1 1j0 0 1 


=|I At]=[J sum matrix]. 


If I change a3 to —1, then all rows of A add to zero. The equation Ax = 9 will now 
have the nonzero solution x = (1,1, 1). A clear signal: This new A can’t be inverted. 
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2.5 B Three of these matrices are invertible, and three are singular. Find the inverse 
when it exists. Give reasons for noninvertibility (zero determinant, too few pivots, nonzero 
solution to Ax = 0) for the other three. The matrices are in the order A, B,C, D, S, E: 


lsclle7]leol lec a a 
8 6 8 7 6 0 6 6 lll 111 
Solution 
1 0 0 
_ 1 7 -3 1 0 6 _ 
1_ = -1_ 1_ | _ 
° -3| -s ‘| © = 36 ¢ < | ° 0-14 


A is not invertible because its determinant is 4-6-—3-8 = 24 — 24 = 0. D is not 

invertible because there is only one pivot; the second row becomes zero when the first row 

is subtracted. E is not invertible because a combination of the columns (the second column 

minus the first column) is zero—in other words Ex = 0 has the solution x = (—1, 1,0). 
Of course ali three reasons for noninvertibility would apply to each of A, D, E. 


2.5 C Apply the Gauss-Jordan method to invert this triangular “Pascal matrix” L. 
You see Pascal’s triangle—adding each entry to the entry on its left gives the entry below. 
The entries of L are “binomial coefficients”. The next row would be 1, 4, 6, 4, 1. 


100 0 
. . 110 0 
Triangular Pascal matrix L = 1210/7 abs(pascal (4,1)) 
1331 
Solution Gauss-Jordan starts with [ L J ] and produces zeros by subtracting row 1: 
100 0/1 00 0 1 0 00| 10 0 0 
i rj=1 t £9 079 10 O07 7) O TO 0-1100 
| 12 1 0/0 01 0 02 1 O}-1 010 
1 3'3 1/0 00 1 03 3 1/-1 00 1 


The next stage creates zeros below the second pivot, using multipliers 2 and 3. Then the 
last stage subtracts 3 times the new row 3 from the new row 4: 


100 0; 1 0 90 0 100 0; 1 0 00 

EN 0 1 0 0;-!1 1 0 0 5 0 10 0;-1 1 00 = [I L 
0 0 1 0} 1 —2 1 0 001 Of 1 -2 1 0 i 
003 1f 2 -3 0 1 0 0 0 I|-1 3 -3 1 


All the pivots were 1! So we didn’t need to divide rows by pivots to get Z. The inverse 
matrix L7! looks like L itself, except odd-numbered diagonals have minus signs. 
The same pattern continues to n by n Pascal matrices, L7! has “alternating diagonals”. 
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Problem Set 2.5 


1 Find the inverses (directly or from the 2 by 2 formula) of A, B,C: 
0 3 2 0 3 4 
a=; 4 and B=; | and c=[5 ‘|. 


2 For these “permutation matrices” find P7! by trial and error (with 1’s and 0’s): 


00 1 0 1 0 
P=|0 1 0 and P=/0 0 1 
1 0 0 1 0 0 


3 Solve for the first column (x, y) and second column (t, z) of A7?!: 


10 20][x]_[1] a [10 20][+]_ fo 
20 50|ly| Zlo] * |20 sollz|~ ]a1}- 
4 Show that [ 42] is not invertible by trying to solve AA~! = I for column 1 of A7!: 


E l H _ A (oe different A, could column 1 of on) 


3 6jj|y 0 be possible to find but not column 2? 


5 Find an upper triangular U (not diagonal) with U? = J which gives U = U7}. 


6 (a) If A is invertible and AB = AC, prove quickly that B = C. 
(b) If A = [}}], find two different matrices such that AB = AC. 


7 (Important) If A has row 1 + row 2 = row 3, show that A is not invertible: 


(a) Explain why Ax = (1,0, 0) cannot have a solution. 
(b) Which right sides (b1, b2, b3) might allow a solution to Ax = b? 
(c) What happens to row 3 in elimination? 


8 If A has column 1 + column 2 = column 3, show that A is not invertible: 


(a) Find a nonzero solution x to Ax = 0. The matrix is 3 by 3. 


(b) Elimination keeps column 1 + column 2 = column 3. Explain why there is no 
third pivot. 


9 Suppose A is invertible and you exchange its first two rows to reach B. Is the new 
matrix B invertible and how would you find B7! from A7!? 


10 Find the inverses (in any legal way) of 
0 0 2 
Az 


>o © 

oom FW 
SoOWN 
INO OS 
amo © 


0 
0 0 3 
0 4 0 
5 0 0 
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11 (a) Find invertible matrices A and B such that A + B is not invertible. 
(b) Find singular matrices A and B such that A + B is invertible. 


12 Ifthe product C = AB is invertible (A and B are square), then A itself is invertible. 
Find a formula for A`! that involves C7! and B. 


13 Ifthe product M = ABC of three square matrices is invertible, then B is invertible. 
(So are A and C.) Find a formula for B~! that involves MT! and A and C. 


14 Ifyou add row 1 of A to row 2 to get B, how do you find B~! from A1? 


Notice the order. Theinverseof B= f ‘| | A | is . 


15 Prove that a matrix with a column of zeros cannot have an inverse. 
16 Multiply | 25 | times [_4 -2 ]. What is the inverse of each matrix if ad # bc? 


17 (a) What 3 by 3 matrix E has the same effect as these three steps? Subtract row 1 
from row 2, subtract row 1 from row 3, then subtract row 2 from row 3. 


(b) What single matrix L has the same effect as these three reverse steps? Add row 
2 to row 3, add row 1 to row 3, then add row 1 to row 2. 


18 If B is the inverse of A”, show that AB is the inverse of A. 


19 Find the numbers a and b that give the inverse of 5 x eye(4) — ones(4,4): 
4 -1 -1 -1]7' [a bbb 
-l 4-1 -l _|b abb 
—l -1 4 -1 |b bab 
-1 —1 -1 4 bb oba 


What are a and b in the inverse of 6 x eye(5) — ones(5,5)? 
20 = Show that A = 4 * eye(4) — ones(4,4) is not invertible: Multiply A x ones(4,1). 


21 There are sixteen 2 by 2 matrices whose entries are 1’s and 0’s. How many of them 
are invertible? 


Questions 22-28 are about the Gauss-Jordan method for calculating A71. 


22 Change J into AT! as you reduce A to J (by row operations): 


l4n=[3 705] m tasi 5 0 4 


23 Follow the 3 by 3 text example but with plus signs in A. Eliminate above and below 
the pivots to reduce [A I]to[I A7!]: 


21010 0 
[A zv]=]1 2 101 0 
012001 


2.5. 


24 


25 


26 


27 


28 


29 


30 


31 
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Use Gauss-Jordan eliminationon [U 7 ] to find the upper triangular U~!: 


lab 10 0 
uU'=] 0 1 cll|x, x2 x3/=/0 1 0 
00 1 0 0 I 


Find A`! and B7! (if they exist) by eliminationon[A I]and[B J]: 


2 1 1 2 —1 —1 
A=|12 1 and B=/]-1 2 —iI 
1 1 2 —-1 -1 2 


What three matrices £21 and Ey2 and DT! reduce A = [}2] to the identity matrix? 
Multiply D7! E12 E2 to find A7!. 


Invert these matrices A by the Gauss-Jordan method starting with [A 7]: 


1 0 0 1 1 1 
A=}2 1 3 and A=|1 2 2 
0 0 1 1 2 3 


Exchange rows and continue with Gauss-Jordan to find A7!: 
02 1 0 
[a =[> 3 o 4]: 
True or false (with a counterexample if false and a reason if true): 


(a) A 4 by 4 matrix with a row of zeros is not invertible. 
(b) Every matrix with 1’s down the main diagonal is invertible. 


(c) If A is invertible then A7! and A? are invertible. 


For which three numbers c is this matrix not invertible, and why not? 
2 c c 
A=|c ec 
8 7 ¢ 


Prove that A is invertible if a Æ 0 anda Æ b (find the pivots or A~!): 
a b b 

A=ļa a b 

aaa 
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This matrix has a remarkable inverse. Find AT! by elimination on [A J]. Extend 
to a 5 by 5 “alternating matrix” and guess its inverse; then multiply to confirm. 


1-1 1 -l 

Invert A = 0 l-i and solve Ax = (1,1,1,1). 
0 0O 1 -l a 
0 0 0 1 


Suppose the matrices P and Q have the same rows as 7 but in any order. They are 
“permutation matrices”. Show that P — Q is singular by solving (P — O)x = 0. 


Find and check the inverses (assuming they exist) of these block matrices: 


I 0 A 0 0 I 

C ł C D I DY 
Could a 4 by 4 matrix A be invertible if every row contains the numbers 0, 1,2,3 in 
some order? What if every row of B contains 0, 1,2, —3 in some order? 


In the Worked Example 2.5 C, the triangular Pascal matrix L has an inverse with 
“alternating diagonals”. Check that this L~! is DLD, where the diagonal matrix 
D has alternating entries 1,—1,1,—1. Then LDLD = I, so what is the inverse of 
LD = pascal (4,1)? 


The Hilbert matrices have Hj; = 1/(i + j — 1). Ask MATLAB for the exact 6 by 
6 inverse invhilb(6). Then ask it to compute inv(hilb(6)). How can these be different, 
when the computer never makes mistakes? 

(a) Use inv(P) to invert MATLAB’s 4 by 4 symmetric matrix P = pascal(4). 

(b) Create Pascal’s lower triangular L = abs(pascal(4,1)) and test P = LLT. 


If A = ones(4) and b = rand(4,1), how does MATLAB tell you that Ax = b has no 
solution? For the special b = ones(4,1), which solution to Ax =b is found by A\b? 


Challenge Problems 


(Recommended) A is a 4 by 4 matrix with 1’s on the diagonal and —a, —b, —c on the 
diagonal above. Find A~! for this bidiagonal matrix. 


Suppose E1, E2, E3 are 4 by 4 identity matrices, except EZ, has a,b,c in column 1 
and Ez has d,e in column 2 and E} has f in column 3 (below the 1’s). Multiply 
L = E E2 E3 to show that all these nonzeros are copied into L. 


E, E2 E3 is in the opposite order from elimination (because £3 is acting first). But 
E, E,E3 = L is in the correct order to invert elimination and recover A. 
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43 


44 


Direct multiplications 1-4 give MM~! = J, and I would recommend doing #3. 
M~! shows the change in A~ (useful to know) when a matrix is subtracted from A: 


1 M=I1-—uv' and MT! = I +uv/(1—vu) (rank | change in 7) 
2 M=A—uv" and MT! = Aq! + Atuo A™t/ (1 — vTA tu) 

3 M=I-UV and M`! = In + U(m — VU) IV 

4 M=A—-UW"V and M“! = A! + ATIU(W — VATU IVA! 


The Woodbury-Morrison formula 4 is the “matrix inversion lemma” in engineering. 
The Kalman filter for solving block tridiagonal systems uses formula 4 at each step. 
The four matrices MT! are in diagonal blocks when inverting these block matrices 
(vT is 1 by n, u isn by 1, V ism byn, U isn by m). 


I u A u In U A U 

v! 1 vI 1l V Im V W 
Second difference matrices have beautiful inverses if they start with 7); = 1 
(instead of Kı; = 2). Here is the 3 by 3 tridiagonal matrix T and its inverse: 


1 -1 0 3 2 1 
Ty=1 T=|-1 2 -i T™i=ļ|2 2 1 
0 -1 2 111 


One approach is Gauss-Jordan elimination on [T J]. That seems too mechanical. 
I would rather write T as the product of first differences L times U. The inverses of 
Z and U in Worked Example 2.5 A are sum matrices, so here are T and TT"): 


1 1 -l 0 1 1 1j|[1 
LU =|—1 1 1 -1 Utpis 1 L|] 1 

0 -1 1 1 1i;1 1 1 

difference difference sum sum 


Question. (4 by 4) What are the pivots of T? What is its 4 by 4 inverse? 
The reverse order UL gives what matrix T*? What is the inverse of T*? 


Here are two more difference matrices, both important. But are they invertible? 


2-1 0 ~i 1 -1 0 0 

. _{-l 2 -1 0 _j-l 2 -1 0 
Cyclic C = 0 -1 > -] Free ends F = 0 1 2 -] 
-1 0-1 2 0 0 -i 1 


One test is elimination—the fourth pivot fails. Another test is the determinant, 
we don’t want that. The best way is much faster, and independent of matrix size: 


Produce x Æ 0 so that Cx = 0. Do the same for Fx = 0. Not invertible. 


Show how both equations Cx = b and Fx = b lead to 0 = bi + b2 +°: + bn. 
There is no solution for other b. 
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Elimination for a 2 by 2 block matrix: When you multiply the first block row by 
CA! and subtract from the second row, the “Schur complement” S appears: 


I 0}; A Bl] _|A B A and D are square 
—-CA I|\|C D| |o S$ S = D — CA7! B. 


Multiply on the right to subtract A~! B times block column 1 from block column 2. 


2 3 3 
A Bl|{fI -—A7!B A B 
—? : — 
É s Ilo I |=: Find S for É |= i ; K 


The block pivots are A and S. If they are invertible, sois[ A B; C D]. 


How does the identity A(Z + BA) = (I + AB)A connect the inverses of I + BA 
and J + AB? Those are both invertible or both singular: not obvious. 
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2.6 Elimination = Factorization: A = LU 


Students often say that mathematics courses are too theoretical. Well, not this section. 
It is almost purely practical. The goal is to describe Gaussian elimination in the most 
useful way. Many key ideas of linear algebra, when you look at them closely, are really 
factorizations of a matrix. The original matrix A becomes the product of two or three 
special matrices. The first factorization—also the most important in practice—comes now 
from elimination. The factors L and U are triangular matrices. The factorization that 
comes from elimination is A = LU. 

We already know U, the upper triangular matrix with the pivots on its diagonal. The 
elimination steps take A to U. We will show how reversing those steps (taking U back 
to A) is achieved by a lower triangular L. The entries of L are exactly the multipliers 
£;;—which multiplied the pivot row j when it was subtracted from row i. 

Start with a 2 by 2 example. The matrix A contains 2, 1, 6,8. The number to eliminate 
is 6. Subtract 3 times row i from row 2. That step is E2; in the forward direction with 
multiplier £21 = 3. The return step from U to A is L = E3} (an addition using +3): 


1 O;/2 1 2 1 
Forward from Ato U : Ena=|_} lle l= [6 s|=v 


Back from U to A: EU = È ale sJ- f a =A, 


The second line is our factorization LU = A. Instead of E7} we write L. Move now to 
larger matrices with many E’s. Then L will include all their inverses. 

Each step from A to U multiplies by a matrix £;; to produce zero in the (i, j ) position. 
To keep this clear, we stay with the most frequent case—when no row exchanges are 
involved. If A is 3 by 3, we multiply by £2; and £3; and £32. The multipliers €;; produce 
zeros in the (2, 1) and (3, 1) and (3, 2) positions—all below the diagonal. Elimination ends 
with the upper triangular U. 

Now move those £’s onto the other side, where their inverses multiply U: 


(Es E31 Eoj)A =U © becomes’ A = (EZ EZ Ez)U whichis A = LU. (1) 
3 , ; 4 21 “31 “32 


The inverses go in opposite order, as they must. That product of three inverses is L. 
We have reached A = LU. Now we stop to understand it. 


Explanation and Examples 


First point: Every inverse matrix ET! is lower triangular. Its off-diagonal entry is €;;, 
to undo the subtraction produced by —€;;. The main diagonals of E and ÆT! contain 1’s. 
Our example above had £2; = 3 and E = |_}? | and L = E™' = [19]. 

Second point: Equation (1) shows a lower triangular matrix (the product of the E;;) 


multiplying A. It also shows all the E;;' multiplying U to bring back A. This lower 
triangular product of inverses is L. 
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One reason for working with the inverses is that we want to factor A, not U. The 
“inverse form” gives A = LU. Another reason is that we get something extra, almost 
more than we deserve. This is the third point, showing that L is exactly right. 
Third point: Each multiplier £;; goes directly into its i, j position—unchanged—in the 
product of inverses which is L. Usually matrix multiplication will mix up all the num- 
bers. Here that doesn’t happen. The order is right for the inverse matrices, to keep the £’s 
unchanged. The reason is given below in equation (3). 

Since each E~! has 1’s down its diagonal, the final good point is that L does too. 


Example 1 Elimination subtracts 5 times row 1 from row 2. The last step subtracts 2 
times row 2 from row 3, The lower triangular L has €2; = 4 and £32 = 2, Multiplying 
LU produces A: 


2 1 0 10 O]f2 1 0 
A=ļ|1 2 1/=/4$ 1 O|]o 2 1/=LU 
0 1 2 02 ıjl0 0 ¢ 


The (3, 1) multiplier is zero because the (3, 1) entry in A is zero. No operation needed. 


Example 2 Change the top left entry from 2 to 1. The pivots all become 1. The multi- 
pliers are all 1. That pattern continues when A is 4 by 4: 


1 1 00 1 1 1 0 0 
Special A= 12 1 0; =/]1 1 1 1 0 
pattern 0 121| |O 1 1 1 1 
0 0 1 2 0 0 1 1 1 


These LU examples are showing something extra, which is very important in practice. 
Assume no row exchanges. When can we predict zeros in L and U? 


When a row of A starts with zeros, so does that row of L. 
When a column of A starts with zeros, so does that column of U. 


If a row starts with zero, we don’t need an elimination step. L has a zero, which saves 
computer time. Similarly, zeros at the start of a column survive into U. But please realize: 
Zeros in the middle of a matrix are likely to be filled in, while elimination sweeps forward. 
We now explain why L has the multipliers £;; in position, with no mix-up. 


The key reason why A equals LU: Ask yourself about the pivot rows that are subtracted 
from lower rows. Are they the original rows of A? No, elimination probably changed them. 
Are they rows of U? Yes, the pivot rows never change again. When computing the third 
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row of U, we subtract multiples of earlier rows of U (not rows of A!): 
Row 3 of U = (Row 3 of A) — £3; (Row 1 of U) — £32(Row 2 of U). (2) 


Rewrite this equation to see that the row [31 £32 1] is multiplying U: 


(3) 


This is exactly row 3 of A = LU. That row of L holds £3, £32, 1. All rows look like this, 
whatever the size of A. With no row exchanges, we have A = LU. 


Better balance The L U factorization is “unsymmetric” because U has the pivots on its 
diagonal where L has 1’s. This is easy to change. Divide U by a diagonal matrix D that 
contains the pivots. That leaves a new matrix with 1’s on the diagonal: 


dı 1 uy2/dy ui3/dı 
də 1 u23/d2 


Split U into 
dn l 
It is convenient (but a little confusing) to keep the same letter U for this new upper trian- 


gular matrix. It has 1’s on the diagonal (like L). Instead of the normal LU, the new form 
has D in the middle: Lower triangular L times diagonal D times upper triangular U. 


Whenever you see LDU, it is understood that U has 1’s on the diagonal. Each row is 
divided by its first nonzero entry—the pivot. Then L and U are treated evenly in LDU: 


E | É 5] splits further into E | É s] E i}. (4) 


The pivots 2 and 5 went into D. Dividing the rows by 2 and 5 left the rows [1 4] and 
[O 1] inthe new U with diagonal ones. The multiplier 3 is still in L. 

My own lectures sometimes stop at this point. The next paragraphs show how elimina- 
tion codes are organized, and how long they take. If MATLAB (or any software) is available, 
you can measure the computing time by just counting the seconds. 


One Square System = Two Triangular Systems 


The matrix L contains our memory of Gaussian elimination. It holds the numbers that 
multiplied the pivot rows, before subtracting them from lower rows. When do we need this 
record and how do we use it in solving Ax = b? 

We need L as soon as there is a right side b. The factors L and U were completely 
decided by the left side (the matrix A). On the right side of Ax = b, we use L! and 
then U~!. That Solve step deals with two triangular matrices. 
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Earlier, we worked on A and b at the same time. No problem with that—just aug- 
ment to [A b]. But most computer codes keep the two sides separate. The memory of 
elimination is held in L and U, to process 6 whenever we want to. The User’s Guide to 
LAPACK remarks that “This situation is so common and the savings are so important that 
no provision has been made for solving a single system with just one subroutine.” 


How does Solve work on b? First, apply forward elimination to the right side (the 
multipliers are stored in L, use them now). This changes b to a new right side c. We are 
really solving Le = b. Then back substitution solves Ux = c as always. The original 
system Ax = b is factored into two triangular systems: 


Forward and backward Solve Le=b andthensolve Ux=ec. © (5) 


To see that x is correct, multiply Ux = c by L. Then LUx = Le is just Ax = b. 

To emphasize: There is nothing new about those steps. This is exactly what we have 
done all along. We were really solving the triangular system Lc = b as elimination went 
forward. Then back substitution produced x. An example shows what we actually did. 


Example 3 Forward elimination (downward) on Ax = b ends at Ux = c: 


u+t2v=5 u+2v=5 
Ax =b 4u + 9v = 21 becomes v=1 Ux=e 


The multiplier was 4, which is saved in L. The right side used it to change 21 to 1: 


Le 


Le =b. The lower triangular system | i| fe] = 2: | gave c= H . 


Q ræ 


Ur=e The upper triangular system | i] [=] = H gives x = HE 


L and U can go into the n? storage locations that originally held A (now forgettable). 


The Cost of Elimination 


A very practical question is cost—or computing time. We can solve 1000 equations on a 
PC. What if n = 100,000? (Not if A is dense.) Large systems come up all the time 
in scientific computing, where a three-dimensional problem can easily lead to a million 
unknowns. We can let the calculation run overnight, but we can’t leave it for 100 years. 
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The first stage of elimination, on column 1, produces zeros below the first pivot. To 
find each new entry below the pivot row requires one multiplication and one subtraction. 
We will count this first stage as n* multiplications and n? subtractions. It is actually less, 
n? — n, because row 1 does not change. 

The next stage clears out the second column below the second pivot. The working 
matrix is now of size n — 1. Estimate this stage by (n — 1)? multiplications and subtractions. 
The matrices are getting smaller as elimination goes forward. The rough count to reach U 
is the sum of squares n? + (n — 1)? +--+» +2? + I?. 

There is an exact formula sn(n + $)(n + 1) for this sum of squares. When n is large, 
the 5 and the 1 are not important. The number that matters is zn, The sum of squares is 
like the integral of x?! The integral from 0 to n is 4n>: 


Elimination on A requires about tn? multiplications and tn? subtractions. 


What about the right side 6? Going forward, we subtract multiples of b, from the lower 
components b2,...,b,. This is n — 1 steps. The second stage takes only n — 2 steps, 
because b; is not involved. The last stage of forward elimination takes one step. 

Now start back substitution. Computing x, uses one step (divide by the last pivot). The 
next unknown uses two steps. When we reach x, it will require n steps (n — 1 substitutions 
of the other unknowns, then division by the first pivot). The total count on the right side, 
from b to ¢ to x—forward to the bottom and back to the top—is exactly n?: 


[(#a-D+@—2+ +1) + E24 --4+ (2-1) $a] =n’. (6) 


To see that sum, pair off (n — 1) with 1 and (n — 2) with 2. The pairings leave n terms, each 
equal to n. That makes n*. The right side costs a lot less than the left side! 


‘Solve Each right side needs n? multiplications and n? subtractions. 


A band matrix B has only w nonzero diagonals below and also above its main diagonal. 
The zero entries outside the band stay zero in elimination (zeros in L and U). Clearing out 
the first column needs w? multiplications and subtractions (w zeros to be produced below 
the pivot, each one using a pivot row of length w). Then clearing out all n columns, to 
reach U, needs no more than nw. This saves a lot of time: 


ae 


čes <. Factor change 5” to nw Solve change n? to2nw ; 

Here are codes to factor A into LU and to solve Ax = b. The Teaching code slu 
stops right away if a number smaller than the tolerance “tol” appears in a pivot position. 
The Teaching Codes are on web.mit.edu/18.06/www. Professional codes will look down 
each column for the largest available pivot, to exchange rows and continue solving. 


MATLAB’s backslash command x = A\b combines Factor and Solve to reach x. 
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function [L,U] = slu(A) 
% Square LU factorization with no row exchanges! 
[n,n] = size(A); tol = l.e—6; 
fork =1:n 
if abs(A(k, k)) < tol 
end % Cannot proceed without a row exchange: stop 
L({k,k) = 1; 
fori =k+1:n 
L(i, k) = A@i,k)/A(k,k); o Multipliers for column k are put into L 
forj =k+1:n % Elimination beyond row k and column k 
A@i, j} = A@i, 7) — LG, k) * A(k, j); % Matrix still called A 


end 
end 
forj =kin 
U(k, j) = Atk, J); % row k is settled, now name it U 
end 
end 


% Solve. using L and U from slu(A). 
[L, U] = slu(A);s = 0; % No row exchanges! 
fork =1:n % Forward elimination to solve Le = b 


for j =1:k-1 
s=s+L(k,j)*e(j); % Add L times earlier c(j) before c(k) 
end 
c(k) = b(k) —s;s =0; % Find c(k) and reset s for next k 
end 


fork =n:-—1:1 % Going backwards from x (n) to x(1) 
for j =k+1:n % Back substitution 
t=t+U(k,j)*x(j); %U times later x(j) 
end 
x(k) = (e(k) —t)/U(k,k); % Divide by pivot 
end ` 
x =x’; % Transpose to column vector 


How long does it take to solve Ax = b? For a random matrix of order n = 1000, 
a typical time is 1 second. See web.mit.edu/18.06 and math.mit.edu/linearalgebra for 
the times in MATLAB, Mapie, Mathematica, SciLab, Python, and R. The time is multiplied 
by about 8 when 7 is multiplied by 2. For professional codes go to netlib.org. 

According to this n? rule, matrices that are 10 times as large (order 10,000) will take a 
thousand seconds. Matrices of order 100,000 will take a million seconds. This is too ex- 
pensive without a supercomputer, but remember that these matrices are full. Most matrices 
in practice are sparse (many zero entries). In that case A = LU is much faster. 

For tridiagonal matrices of order 10,000, storing only the nonzeros, solving Ax = b 
is a breeze. Provided the code recognizes that A is tridiagonal. 
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= REVIEW OF THE KEY IDEAS m 


1. Gaussian elimination (with no row exchanges) factors A into L times U. 


2. The lower triangular L contains the numbers ¢;; that multiply pivot rows, going from 
A to U. The product LU adds those rows back to recover A. 


. On the right side we solve Le = b (forward) and Ux = c (backward). 
. Factor : There are i(n? — n) multiplications and subtractions on the left side. 


. Solve : There are n? multiplications and subtractions on the right side. 


A Un A U 


. For a band matrix, change 4n? to nw? and change n? to 2wn. 


=m WORKED EXAMPLES =" 


2.6 A The lower triangular Pascal matrix L contains the famous “Pascal triangle”. 
Gauss-Jordan found its inverse in the worked example 2.5 C. This problem connects L 
to the symmetric Pascal matrix P and the upper triangular U. The symmetric P has Pas- 
cal’s triangle tilted, so each entry is the sum of the entry above and the entry to the left. The 
n by n symmetric P is pascal(n) in MATLAB. 


Problem: Establish the amazing lower-upper factorization P = LU. 


ll l i 1000771111 
123 4 1100/1012 3 
pascal(4)= | 1 3 6 yo |=11210t/0013/724 
i 4 10 20 i331/l0001 


Then predict and check the next row and column for 5 by 5 Pascal matrices. 


Solution You could multiply LU to get P. Better to start with the symmetric P and 
reach the upper triangular U by elimination: 


1 1 1 1 1 i 1 1 1 1 1 i 1 1 1 1 

P= 12 3 4 a 0 12 3 a 0 1 2 3 > 0 12 3 -U 
1 3 6 10 025 9 00 1 3 0 0 1 3 i 
1 4 10 20 0 3 9 19 0 0 3 10 0 00 1 


The multipliers 2;; that entered these steps go perfectly into L. Then P = LU is a partic- 
ularly neat example. Notice that every pivot is 1 on the diagonal of U. 

The next section will show how symmetry produces a special relationship between the 
triangular L and U. For Pascal, U is the “transpose” of L. 
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You might expect the MATLAB command lu(pascal(4)) to produce these L and U. 
That doesn’t happen because the lu subroutine chooses the largest available pivot in each 
column. The second pivot will change from 1 to 3. But a “Cholesky factorization” does no 
row exchanges: U = chol(pascal(4)) 

The full proof of P = LU for all Pascal sizes is quite fascinating. The paper “Pascal 
Matrices” is on the course web page web.mit.edu/18.06 which is also available through 
MIT’s OpenCourseWare at ocw.mit.edu. These Pascal matrices have so many remarkable 
properties—we will see them again. 


2.6 B The problem is: Solve Px = b = (1,0,0,0). This right side = column of T 
means that x will be the first column of P~!. That is Gauss-Jordan, matching the columns 
of PPT! = J. We already know the Pascal matrices L and U as factors of P: 


Two triangular systems Le = b (forward) Ux =c (back). 


Solution The lower triangular system Le = b is solved top to bottom: 


C1 = 1 cy = +1 
cit e =0 . C2 = —l1 
Cy +22 + C3 =0 BIves c3 = +1 
Ci + 3c2 + 3c3 +04 = 0 C4 = —1 


Forward elimination is multiplication by L~!. It produces the upper triangular system 
Ux = c. The solution x comes as always by back substitution, bottom to top: 


XptXgt x3+ x4 = 1 xı = +4 
X2 + 2x3 + 3x4 = -1 , X2 = —6 
x3 + 3x4 = 1 gives x3 = +4 

X4 = -I x4 = —I 


I see a pattern in that x, but I don’t know where it comes from. Try inv(pascal(4)). 


Problem Set 2.6 
Problems 1-14 compute the factorization A = LU (and also A= LDU). 
1 (Important) Forward elimination changes [}4]x = b toa triangular [ 3} |x = c: 
x+ y= x+ y=5 115 11 5 
— 
x+2y=7 y=2 162 7 0 1 2 
That step subtracted £2; = times row 1 from row 2. The reverse step adds 
l2; times row 1 to row 2. The matrix for that reverse step is L = . Multiply 
this L times the triangular system [ 1} ]x1 = [3] to get = . In letters, 


L multiplies Ux = c to give 


2 Write down the 2 by 2 triangular systems Le = b and Ux = c from Problem 1. 
Check that ¢ = (5,2) solves the first one. Find x that solves the second one. 
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3 (Move to 3 by 3) Forward elimination changes Ax = b toa triangular Ux = c: 


x+ yt z=5 x+ y+ z=5 x+ y+ z=5 
x+2y+3z=7 yt+2z=2 y+2z=2 
x+3y+6z=11 2y+5z =6 z=2 


The equation z = 2 in Ux = c comes from the original x + 3y + 6z = 11 in 
Ax = b by subtracting 3; = times equation I and £32 = times the 
final equation 2. Reverse that to recover [1 3 6 11] in the last row of A and b 
from the final{1 1 1 S]and[O 1 2 2]and[0 0 1 2]inU ande: 


Row 3 of [A b] = (£3; Row 1 + £32 Row 2 + 1 Row 3) of[U e]. 
In matrix notation this is multiplication by L. So A = LU and b = Le. 


4 What are the 3 by 3 triangular systems Le = b and Ux = c from Problem 3? 
Check that ¢ = (5,2, 2) solves the first one. Which x solves the second one? 


5 What matrix E puts A into triangular form EA = U? Multiply by E~! = L to 
factor A into LU: 


2 1 0 
A=|]0 4 2 
6 3 5 


6 What two elimination matrices F2,; and £32 put A into upper triangular form 
E32E21 A = U? Multiply by E3% and E3}! to factor A into LU = Ez} E3} U: 


1 1 1 

A= |]2 4 5 

0 4 0 
7 What three elimination matrices E21, E31, £32 put A into its upper triangular form 

E32 E31 E214 = U? Multiply by E3}, Ej} and E>; to factor A into L times U: 

1 
2 
5 


1 0 
A=|2 2 L = Ez} E3 Ex. 
3 4 


8 Suppose A is already lower triangular with 1’s on the diagonal. Then U = I! 
100 
A=L=ja 1 0 
b c il 


The elimination matrices E21, E31, E32 contain —a then —b then —c. 


(a) Multiply £32 £31 E2: to find the single matrix E that produces EA = I. 
(b) Multiply Ez} E3} E32 to bring back L (nicer than E). 


104 


10 


11 


12 


13 


14 
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When zero appears in a pivot position, A = LU is not possible! (We are requiring 
nonzero pivots in U.) Show directly why these are both impossible: 


0 1 1 olfd e 1 1 0 1 deg 
2 3l=le allo f 1 1 2t={2 1 f h 
1 2 1 m n Ii i 


This difficulty is fixed by a row exchange. That needs a “permutation” P. 


Which number c leads to zero in the second pivot position? A row exchange is 
needed and A = LU will not be possible. Which c produces zero in the third pivot 
position? Then a row exchange can’t help and elimination fails: 


What are L and D (the diagonal pivot matrix) for this matrix A? What is U in 
A = LU and what is the new U in A = LDU? 


2 4 8 
Already triangular A=1!0 3 9 
0 0 7 


A and B are symmetric across the diagonal (because 4 = 4). Find their triple factor- 
izations L DU and say how U is related to L for these symmetric matrices: 


> 4 I 4 9 
Symmetric A= | 4] ‘| and B=]|4 12 4 
0 40 


(Recommended) Compute L and U for the symmetric matrix A: 


Rae gaan 
roo 8 
a oO & & 
Qo SA 


Find four conditions on a, b, c, d to get A = LU with four pivots. 


This nonsymmetric matrix will have the same L as in Problem 13: 


Find L and U for A= 


2 &2 & & 
orio 
A e a a 
Quam G’ N 


Find the four conditions on a, b,c, d,r,s,t to get A = LU with four pivots. 
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Problems 15-16 use L and U (without needing A) to solve Ax = b. 


15 


16 


17 


18 


19 


20 


21 


Solve the triangular system Lc = b to find e. Then solve Ux = c to find x: 


1 0 2 4 2 
L=|| | and v=; d and o=|,7]. 


For safety multiply LU and solve Ax = b as usual. Circle c when you see it. 


Solve Le = b to find c. Then solve Ux = c to find x. What was A? 


1 0 0 1 1 1 4 
L=]1 1 0 and U=]O0 i 1 and b=|5 
1 1 1 0 0 1 6 
(a) When you apply the usual elimination steps to L, what matrix do you reach? 
1 0 O0 
L= £1 1 0 
£3, £32 1 


(b) When you apply the same steps to J, what matrix do you get? 

(c) When you apply the same steps to LU, what matrix do you get? 
If A = LDU and also A = L,D,U, with all factors invertible, then L = Lı and 
D = D; and U = U1. “The three factors are unique.” 
Derive the equation L]'LD = D,U,U™~". Are the two sides triangular or diagonal? 
Deduce L = Lı and U = U; (they all have diagonal 1’s). Then D = Dj. 


Tridiagonal matrices have zero entries except on the main diagonal and the two ad- 
jacent diagonals. Factor these into A = LU and A = LDL": 


1 1 0 a a 0 
A=|{1 2 1 and A=|a a+b b 
0 1 2 0 b b+e 


When T is tridiagonal, its L and U factors have only two nonzero diagonals. How 
would you take advantage of knowing the zeros in 7, in a code for Gaussian elimi- 
nation? Find L and U. ` 


Tridiagonal T= 


on eR 
m U ND 
WO © 


0 
1 
2 
0 0 3 


If A and B have nonzeros in the positions marked by x, which zeros (marked by 0) 
stay zero in their factors L and U? 


A= 


COON 
OM x 
we MOH OM 
xR OM 
ORR 
x OX & 
x xO 
x RHO 
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Suppose you eliminate upwards (almost unheard of). Use the last row to produce 
zeros in the last column (the pivot is 1). Then use the second row to produce zero 
above the second pivot. Find the factors in the unusual order A = UL. 


5 3 1 
Upper times lower A=}j}3 3 
1 1 


Easy but important. If A has pivots 5, 9, 3 with no row exchanges, what are the pivots 
for the upper left 2 by 2 submatrix Az (without row 3 and column 3)? 


Challenge Problems 


Which invertible matrices allow A = LU (elimination without row exchanges)? 
Good question! Look at each of the square upper left submatrices of A. 


All upper left k by k submatrices A; must be invertible (sizes k = 1,...,#). 
Explain that answer: A, factors into because LU = Lx ° | | 5 k i |. 


For the 6 by 6 second difference constant-diagonal matrix K, put the pivots and 
multipliers into K = LU. (L and U will have only two nonzero diagonals, because 
K has three.) Find a formula for the i, j entry of L~!, by software like MATLAB 
using inv(L) or by looking for a nice pattern. 


2-1 
—1 
—1,2,—1 matrix K = So = toeplitz([2 —1 0 0 0 0) 
- —l 
-1 2 
If you print K—!, it doesn’t look so good. But if you print 7K~! (when K is 6 by 6), 
that matrix looks wonderful. Write down 7K! by hand, following this pattern: 


1 Row 1 and column t are (6,5, 4, 3,2, 1). 
2 On and above the main diagonal, row i is i times row 1. 


3 On and below the main diagonal, column j is j times column 1. 


Multiply K times that 7K71 to produce 77. Here is that pattern for n = 3: 


3 by 3 case 2 -1 0||3 2 1 4 
The determinant (K)(4K7™')=|-1 2 -1]//2 4 2|= 4 
of this K is 4 0 -1 2/{1 2 3 4 
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2.7 Transposes and Permutations 


We need one more matrix, and fortunately it is much simpler than the inverse. It is the 
“transpose” of A, which is denoted by AT. The columns of AT are the rows of A. 
When A is an m by n matrix, the transpose is n by m: 


Lh 2. 3 


Transpose If A= | 004 


1 0 
| then AT=| 2 0 
3. 4 
You can write the rows of A into the columns of AT. Or you can write the columns of A 


into the rows of AT. The matrix “flips over” its main diagonal. The entry in row i, column j 
of AT comes from row j, column i of the original A: 


Exchange rows and columns l (A); f= Aji 


The transpose of a lower triangular matrix is upper triangular. (But the inverse is still lower 
triangular.) The transpose of AT is A. 


Note MATLAB’s symbol for the transpose of A is A’. Typing[1 2 3] gives a row vec- 
tor and the column vector is v = [1 2 3]’. To enter a matrix M with second column 
w =[456]’ you could define M =[ v w ]. Quicker to enter by rows and then 
transpose the whole matrix: M =[1 2 3; 4 5 6]’. 


The rules for transposes are very direct. We can transpose A + B to get (A + B)". 
Or we can transpose A and B separately, and then add AT + B™—with the same result. 
The serious questions are about the transpose of a product AB and an inverse A7!: 


Sum The transposeof A+B is A’+ 8B. (1) 
Product ‘The transpose of AB vis (ABY = B TAT, (2) 
Inverse The transpose of AT! is (ATT = (AT). (3) 


Notice especially how BTAT comes in reverse order. For inverses, this reverse order 
was quick to check: B~!A7! times AB produces J. To understand (AB)! = BTA’, 
start with (Ax)? = xTAT: ` 


Ax combines the columns of A while xA" combines the rows of A". 


It is the same combination of the same vectors! In A they are columns, in A? they are rows. 
So the transpose of the column Ax is the row x" A. That fits our formula (Ax)? = xTAT. 
Now we can prove the formula (AB) = BTAT, when B has several columns. 

If B = [x ; x2] has two columns, apply the same idea to each column. The columns 
of AB are Ax; and Ax. Their transposes are the rows of B' A?: 


xTAT 


Transposing AB = | Axı Ax2 =- | gives | x34" | whichis BTAT. (4) 


. 
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The right answer B? AT comes out a row at a time. Here are numbers in (AB)? = BTA: 


wi JEJE Y e eE JETE 


The reverse order rule extends to three or more factors: (ABC)" equals CTBTAT. 
If A= LDU then A’ = UTDTL". The pivot matrix has D = D" . 


Now apply this product rule to both sides of A~!'A = J. On one side, 7T is I. We 
confirm the rule that (A~!)" is the inverse of AT, because their product is J: 


Transpose of inverse A'A=I is transposed to AT(A TDT =Í. (5) 


Similarly AA~! = J leads to (A~!)'A? = J. We can invert the transpose or we can 
transpose the inverse. Notice especially: AT is invertible exactly when A is invertible. 


Example 1 The inverse of A = [2°] is A~! = [_1°@]. The transpose is A? = [4$]. 


(A`) and (A! are both equalto [1-5]. 


The Meaning of Inner Products 


We know the dot product (inner product) of x and y. It is the sum of numbers x; yi. 
Now we have a better way to write x - y, without using that unprofessional dot. Use 
matrix notation instead: 


Tisinside The dot product or inner product is x" y (1 xn)(n x 1) 


T is outside The rank one product or outer productis xy! (ax) xn) 
xTy is a number, xy" is a matrix. Quantum mechanics would write those as < x|y > 


(inner) and |x >< y| (outer). I think the world is governed by linear algebra, but physics 
disguises it well. Here are examples where the inner product has meaning: 


From mechanics Work = (Movements) (Forces) = x" f 
From circuits Heat loss = (Voltage drops) (Currents) = eT y 


From economics Income = (Quantities) (Prices) = q7 p 


We are really close to the heart of applied mathematics, and there is one more point to 
explain. It is the deeper connection between inner products and the transpose of A. 


We defined A! by flipping the matrix across its main diagonal. That’s not mathematics. 
There is a better way to approach the transpose. AT is the matrix that makes these two 
inner products equal for every x and y: 


(Ax)'y = x"(ATy) Inner product of Ax with y = Inner product of x with AT y 
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-~l1 1 0 *1 
Example 2 Start with A = | 0 -1 | x = | x2 y= z 
X3 


On one side we have Ax multiplying y: (x2 — x1)}yı + (x3 — X2)y2 
That is the same as x1(—y1) + x2(¥1 — y2) + x3(y2). Now x is multiplying ATy. 


—y1 —1 0 
A'y must be | yı — y2 | which produces AT=| 1 —1 | as expected. 
y2 0 1l 


Example 3 Will you allow me a little calculus? It is extremely important or I wouldn’t 
leave linear algebra. (This is really linear algebra for functions x (t).) The difference ma- 
trix changes to a derivative A = d/dt. Its transpose will now come from (dx/dt, y) = 
(x,-dy/dt). 

The inner product changes from a finite sum of x; yg to an integral of x(t) y(t). 


oo 


Inner product 7 
of funetions xy =(%y) = J x(t) y(t) dt by definition 
—O 
oO rove) 
Transpose rule dx dy > 
(Ax)! y = xT(ATy) We dt= | x(t) Tt dt shows A (6) 
oo so 


I hope you recognize “integration by parts”. The derivative moves from the first 
function x(t) to the second function y(t). During that move, a minus sign appears. 
This tells us that the “transpose” of the derivative is minus the derivative. 

The derivative is anti-symmetric: A = d/dt and AT = —d/dt. Symmetric matrices 
have AT = A, anti-symmetric matrices have AT = —A. In some way, the 2 by 3 difference 
matrix above followed this pattern. The 3 by 2 matrix AT was minus a difference matrix. 
It produced y; — yz in the middle component of AT y instead of the difference y2 — yı. 


Symmetric Matrices 


For a symmetric matrix, transposing A to AT produces no change. Then AT = A. Its (j,i) 
entry across the main diagonal equals its (i, J) entry. In my opinion, these are the most 
important matrices of all. 


, . {1 2|_ f Ùi O|_ yer 
Symmetric matrices a=|} s|=4 and p=| 4 |= 2" 


The inverse of a symmetric matrix is also symmetric. The transpose of A`! is 
(A-!)T = (ADT! = A7!. That says A! is symmetric (when A is invertible): 


we -+»_ | 5 -2 -; _|1 0 
Symmetric inverses AUC = È l and D ‘= 0 oil 


Now we produce symmetric matrices by multiplying any matrix R by R*. 
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Symmetric Products RTR and RRT and LDL" 


Choose any matrix R, probably rectangular. Multiply RT times R. Then the product RTR 
is automatically a square symmetric matrix: 


The transpose of RTR is R'(R"™)" whichis RTR. (7) 


That is a quick proof of symmetry for RTR. We could also look at the (i, j) entry of RTR. 
It is the dot product of row i of RT (column i of R) with column j of R. The (j,i) entry 
is the same dot product, column j with column i. So RTR is symmetric. 


The matrix RRT is also symmetric. (The shapes of R and RT allow multiplication.) 
But RR’ is a different matrix from RTR. In our experience, most scientific problems that 
start with a rectangular matrix R end up with RTR or RRT or both. As in least squares. 


—] 0 
Example 4 Multiply R = E 4 | and RT = | 1 —1 | in both orders. 
0 1 
> 1-1 0 
RRT = E >| and RTR = | —1 2 —1 | are both symmetric matrices. 
0 -l I 


The product RTR is n by n. In the opposite order, RRT is m by m. Both are symmetric, 
with positive diagonal (why?). But even if m = n, it is not very likely that RTR = RRT. 
Equality can happen, but it is abnormal. 


Symmetric matrices in elimination AT = A makes elimination faster, because we can 
work with half the matrix (plus the diagonal). It is true that the upper triangular U is 
probably not symmetric. The symmetry is in the triple product A = LDU. Remember 
how the diagonal matrix D of pivots can be divided out, to leave 1°s on the diagonal of both 
Land U: 


1 2} _|1 0 1 2 LU misses the symmetry of A 
2 7| 42 1 0 3 


_ {1 0 1 0 1 2] LDU captures the symmetry 
[2.4 0 3 O 1) Now Uis the transpose of L. 


When A is symmetric, the usual form A = LDU becomes A = LDL’. The final U 
(with 1’s on the diagonal) is the transpose of L (also with 1’s on the diagonal). The 
diagonal matrix D containing the pivots is symmetric by itself. 


The symmetric factorization of a symmetric matrix is A = LDL". 


Notice that the transpose of LDL’ is automatically (LDT D™L™ whichis LDL’ again. 
The work of elimination is cut in half, from n?/3 multiplications to 17/6. The storage is 
also cut essentially in half. We only keep L and D, not U which is just LT. 
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Permutation Matrices 


The transpose plays a special role for a permutation matrix. This matrix P has a single “1” 
in every row and every column. Then PT is also a permutation matrix—maybe the same 
or maybe different. Any product P, P2 is again a permutation matrix. We now create every 
P from the identity matrix, by reordering the rows of J. 

The simplest permutation matrix is P = I (no exchanges). The next simplest are the 
row exchanges P;;. Those are constructed by exchanging two rows i and j of J. Other 
permutations reorder more rows. By doing all possible row exchanges to /, we get all 
possible permutation matrices: 


Example 5 There are six 3 by 3 permutation matrices. Here they are without the zeros: 


1 l l 
Po, =] 1 P32 P21 = 


Dag 
Il 
p.b 
| 
p 


1 1 l 
P31 = l P32 = 1l Pai P32 = | 1 
1 l 1 


There are n! permutation matrices of order n. The symbol n! means “n factorial,” the 
product of the numbers (1)(2)--- (n). Thus 3! = (1)(2)(3) which is 6. There will be 24 
permutation matrices of order n = 4. And 120 permutations of order 5. 

There are only two permutation matrices of order 2, namely | 4? ] and [$2]. 

Important: PT! is also a permutation matrix. Among the six 3 by 3 P’s displayed 
above, the four matrices on the left are their own inverses. The two matrices on the right 
are inverses of each other. In all cases, a single row exchange is its own inverse. If we 
repeat the exchange we are back to I. But for P32 P21, the inverses go in opposite order 
as always. The inverse is P21 P32. 


More important: P—! is always the same as PT. The two matrices on the right are 
transposes—and inverses—of each other. When we multiply PPT, the “1” in the first row 
of P hits the “1” in the first column of PT (since the first row of P is the first column of 
PT). It misses the ones in all the other columns. So PPT = J. 


Another proof of PT = P~! looks at P as a product of row exchanges. Every row 
exchange is its own transpose and its own inverse. PT and P~! both come from the 
product of row exchanges in reverse order. So PT and P—! are the same. 


Symmetric matrices led to A = LDL". Now permutations lead to PA = LU. 
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The PA = LU Factorization with Row Exchanges 


We sure hope you remember A = LU. It started with A = (Ezj'--- Ej;'---)U. Every 
elimination step was carried out by an £;; and it was inverted by £; -1 . Those i inverses were 
compressed into one matrix L, bringing U back to A. The lower triangular L has 1’s on 
the diagonal, and the result is A= LU. 

This is a great factorization, but it doesn’t always work. Sometimes row exchanges 
are needed to produce pivots. Then A = (E7!.-. P71... E7!... P7!...)U. Every row 
exchange is carried out by a P;; and inverted by that P;;. We now compress those row ex- 
changes into a single permutation matrix P. This gives a factorization for every invertible 
matrix A—which we naturally want. 

The main question is where to collect the Pj;’s. There are two good possibilities— 
do all the exchanges before elimination, or do them after the E;;’s. The first way gives 
PA = LU. The second way has a permutation matrix Pı in the middle. 


1. The row exchanges can be done in advance. Their product P puts the rows of A in 
the right order, so that no exchanges are needed for PA. Then PA = LU. 


2. If we hold row exchanges until after elimination, the pivot rows are in a strange order. 
Pı puts them in the correct triangular order in U,. Then A = Ly P Uj. 


PA = LU is constantly used in all computing (and in MATLAB). We will concentrate on 
this form. Most numerical analysts have never seen the other form. 


The factorization A = L, PU; might be more elegant. If we mention both, it is because 
the difference is not well known. Probably you will not spend a long time on either one. 
Please don’t. The most important case has P = J, when A equals L U with no exchanges. 


For this matrix A, exchange rows 1 and 2 to put the first pivot in its usual place. 
Then go through elimination on PA: 


O 1 1 12 1 1 2 1 12 1 

1 2 Iiļ> 0 1 ij> |O 1 Iij> 0O 1 1 

27 9. 2 7 9 03 7 0 0 4 

£3) = £32 =3 

The matrix PA has its rows in good order, and it factors as usual into L U: 
0 1 0 1 0 0 1 2 1 
P=|1 0 0 PA=/;}0 1 0//0 1 1/=E£U. (8) 

001 2 3 1 0 0 4 


We started with A and ended with U. The only requirement is invertibility of A. 


s for A to be. invertible: 
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In MATLAB, A([r k],:) = A([k r],:) exchanges row k with row r below it (where the 
kth pivot has been found). Then the lu code updates L and P and the sign of P: 


A([r k],:) = ACK r},:); 


This is part of L([r k], 1 :k—1)= L([k r], 1:k-— 1) 
[L,U, P ] = lu(A) P([r k], = Pk r],:); 
sign = —sign 


The “sign” of P tells whether the number of row exchanges is even (sign = +1). 
An odd number of row exchanges will produce sign = —1. At the start, P is J and sign 
= +1. When there is a row exchange, the sign is reversed. The final value of sign is the 
determinant of P and it does not depend on the order of the row exchanges. 


For PA we get back to the familiar L U. This is the usual factorization. In reality, 
lu(A) often does not use the first available pivot. Mathematically we accept a small pivot— 
anything but zero. It is better if the computer looks down the column for the largest pivot. 
(Section 9.1 explains why this “partial pivoting” reduces the roundoff error.) Then P may 
contain row exchanges that are not algebraically necessary. Still PA = LU. 


Our advice is to understand permutations but let the computer do the work. Calculations 
of A = LU are enough to do by hand, without P. The Teaching Code splu(A) factors 
PA = LU and spiv(4, b} solves Ax = b for any invertible A. The program splu stops if 
no pivot can be found in column k. Then A is not invertible. 


= REVIEW OF THE KEY IDEAS =u 
1. The transpose puts the rows of A into the columns of AT. Then (A); = Aji. 
2. The transpose of AB is B™ A’. The transpose of A`! is the inverse of A’. 
3. The dot product is x - y = xy. Then (Ax)' y equals the dot product xT(AT y). 
4. When A is symmetric (A? = A), its LDU factorization is symmetric: A = LDL’. 
5. A permutation matrix P has a 1 in each row and column, and PT = P a 
6. There are n! permutation matrices of size n. Half even, half odd. 


7. If A is invertible then a permutation P will reorder its rows for PA = LU. 
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=m WORKED EXAMPLES #8 


2.7A Applying the permutation P to the rows of A destroys its symmetry: 


0 1 0 1 4 5 4 2 6 
P=|]0 0 1 A=|4 2 6 PA=|5 6 3 
1 0 0 5 6 3 14 5 


What permutation Q applied to the columns of PA will recover symmetry in PAQ‘ 
The numbers 1,2,3 must come back to the main diagonal (not necessarily in order). 
Show that Q is PT, so that symmetry is saved by PAQ = PAP™. 


Solution To recover symmetry and put “2” back on the diagonal, column 2 of PA 
must move to column 1. Column 3 of PA (containing “3”) must move to column 2. 
Then the “1” moves to the 3, 3 position. The matrix that permutes columns is Q: 


4 2 6 0 0 1 2 6 4 
PA=|5 6 3 Q=]1 0 0 PAQ=|6 3 51] issymmetric. 
14 5 0 1 0 4 5 1 


The matrix Q is PT. This choice always recovers symmetry, because PAP? is guaranteed 
to be symmetric. (Its transpose is again PAP™.) The matrix Q is also P~, because the 
inverse of every permutation matrix is its transpose. 

If D is a diagonal matrix, we are finding that PDP" is also diagonal. When P moves 
row 1 down to row 3, PT on the right will move column 1 to column 3. The (1, 1) entry 
moves down to (3, 1) and over to (3, 3). 


2.78 Find the symmetric factorization A = LDL" for the matrix A above. Is this A 
invertible? Find also the PQ = L U factorization for Q, which needs row exchanges. 


Solution To factor A into LDLT we eliminate below the pivots: 


14 5 1 4 5 I 4 5 
A=|{4 2 6| — |0 -14 -14 | — | 0 —-14 -14] =U. 
5 6 3 0 —14 —22 0 0 -8 


The multipliers were £2; = 4 and £3; = 5 and £32 = 1. The pivots 1, —14, —8 go into D. 
When we divide the rows of U by those pivots, LT should appear: 


Symmetric 1 0 0 1 l 4 5 
factorization A=LDL'’=|\4 1 0 —14 011 
when A = A? 5 1 1 —8 001 


This matrix A is invertible because it has three pivots. Its inverse is (LT)! DT! L! and 
AT! is also symmetric. The numbers 14 and 8 will turn up in the denominators of A7!. 
The “determinant” of A is the product of the pivots (1)(—14)(—8) = 112. 
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Any permutation matrix Q is invertible. Here elimination needs two row exchanges: 
I 100 rows 100 
Q=;1 0 0| — |0 0141; — JO 1 Oļj=I. 
1 0 0 1 O| 2<3 |0 0 1 
With A = Q, the PỌ = (L)(U) factorization is the same as Q7! Ọ = (IXD. 
2.7 C For a rectangular A, this saddle-point matrix S is symmetric and important: 


Block matrix _ | I A 


— oT , 
from least squares AT 0 | = § has sizem +n. 
Apply block elimination to find a block factorization S = LDL". Then test invertibility: 


S is invertible <=> ATA isinvertible <> Ax £0 whenever x £0 
Solution The first block pivot is 7. The matrix to multiply row 1 is certainly AT: 


ar I A I A o, 
Block elimination s=| jr J goes to E Aral This is U. 


The block pivot matrix D contains 7 and —ATA. Then L and L" contain AT and A: 


g _ r_I? 0 I 0 I A 
Block factorization S = LDL = | AT l 0 —ATA o rl 


L is certainly invertible, with diagonal 1’s from 7. The inverse of the middle matrix 
involves (A™A)~!. Section 4.2 answers a key question about the matrix ATA: 


When is ATA invertible? Answer: A must have independent columns. 
Then Ax =0 only if x =0. Otherwise Ax = 0 will lead to A’ Ax =0. 


Problem Set 2.7 


Questions 1-7 are about the rules for transpose matrices. 


1 Find A™ and A`! and (A7!)? and (AT)! for 


1 0 l c 
a=] 3] and also a=; ol 


2 ‘Verify that (AB) equals BTAT but those are different from AT BT: 


E] eB] pA 


In case AB = BA (not generally true!) how do you prove that BTAT = ATBT? 
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3 (a) The matrix ((AB)~')? comes from (A~!)* and (B~!)". In what order? 
(b) If U is upper triangular then (U—')? is triangular. 


4 Show that A? = Ois possible but ATA = Qis not possible (unless A = zero matrix). 


5 (a) The row vector x! times A times the column y produces what number? 
0 
1 2 3 
T _ _ 
xTAy =[0 Jfa 2 s] 1) = 


(b) This is the row xTA = 
(c) This is the row xT = [0 1] times the column Ay = 


times the column y = (0, 1,0). 


6 The transpose of a block matrix M = [AB] is MT = . Test an example. 
Under what conditions on A, B, C, D is the block matrix symmetric? 


7 True or false: 


(a) The block matrix | ? 4] is automatically symmetric. 

(b) If A and B are symmetric then their product AB is symmetric. 
(c) If A is not symmetric then AT! is not symmetric. 

(d) When A, B,C are symmetric, the transpose of ABC is CBA. 


Questions 8-15 are about permutation matrices. 
8 Why are there n! permutation matrices of order n? 


9 If Pı and Pz are permutation matrices, so is Pı P2. This still has the rows of J in 
some order. Give examples with Pı Pa 4 PzP, and P3 P4 = P4P3. 


10 There are 12 “even” permutations of (1, 2,3, 4), with an even number of exchanges. 
Two of them are (1, 2,3, 4) with no exchanges and (4, 3,2, 1) with two exchanges. 
List the other ten. Instead of writing each 4 by 4 matrix, just order the numbers. 


11 Which permutation makes PA upper triangular? Which permutations make P; A Pz 


lower triangular? Multiplying A on the right by Pz exchanges the of A. 
0 0 6 
A=]|1 2 3 
0 4 5 


12 Explain why the dot product of x and y equals the dot product of Px and Py. 
Then from (Px)'(Py) = xTy deduce that PTP = I for any permutation. With 
x = (1,2,3) and y = (1,4, 2) choose P to show that Px - y is not always x - Py. 
13 (a) Find a 3 by 3 permutation matrix with P? = J (but not P = 1). 
(b) Find a 4 by 4 permutation P with P+ £ 1. 


2.7. Transposes and Permutations 117 


14 


15 


If P has 1’s on the antidiagonal from (1,7) to (n, 1), describe PAP. Note P = PT. 


All row exchange matrices are symmetric: PT = P. Then PTP = I becomes 
P? = I. Other permutation matrices may or may not be symmetric. 


(a) If P sends row 1 to row 4, then PT sends row to row 
When PT = P the row exchanges come in pairs with no overlap. 


(b) Find a 4 by 4 example with PT = P that moves all four rows. 


Questions 16-21 are about symmetric matrices and their factorizations. 


16 


18 


19 


20 


21 


If A = A7 and B = BT, which of these matrices are certainly symmetric? 
(a) A* — B? (b) (A+ B)(A- B) (c) ABA (d) ABAB. 
Find 2 by 2 symmetric matrices A = AT with these properties: 


(a) A is not invertible. 
(b) A is invertible but cannot be factored into L U (row exchanges needed). 


(c) A can be factored into LDLT but not into LLT (because of negative D). 


(a) How many entries of A can be chosen independently, if A = AT is 5 by 5? 
(b) How do L and D (still 5 by 5) give the same number of choices in LDLT? 


(c) How many entries can be chosen if A is skew-symmetric? (AT = —A). 
Suppose R is rectangular (m by n) and A is symmetric (m by m). 


(a) Transpose R™AR to show its symmetry. What shape is this matrix? 


(b) Show why RTR has no negative numbers on its diagonal. 


Factor these symmetric matrices into A = LDL". The pivot matrix D is diagonal: 


2-1 0 
a=|; | and a=|, A and A=|{-l 2 -l 
0 —i 2 


After elimination clears out column 1 below the first pivot, find the symmetric 2 by 
2 matrix that appears in the lower right corner: 


2 4 8 1 b c 
Start from A=|4 3 9 and A=|b d e 
8 9 0 c e f 
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Questions 22-24 are about the factorizations PA = L U and A = L1P,U,. 
22 Find the PA = LU factorizations (and check them) for 


01 1 1 2 0 
A=|1 0 1 and A=|2 4 1 
2 3 4 I 1 1 
23 Find a 4 by 4 permutation matrix (call it A) that needs 3 row exchanges to reach the 


end of elimination. For this matrix, what are its factors P, L, and U? 


24 Factor the following matrix into PA = LU. Factor it also into A = LPU: 
(hold the exchange of row 3 until 3 times row | is subtracted from row 2): 


0 1 2 
A=|0 3 8 
2 1 1 


25 Extend the slu code in Section 2.6 to a code splu that factors PA into LU. 


26 Prove that the identity matrix cannot be the product of three row exchanges (or five). 
It can be the product of two exchanges (or four). 


27 (a) Choose £2; to remove the 3 below the first pivot. Then multiply E2, AE}, to 
remove both 3’s: 


1 3 0 10 0 
A=]/]3 Ill 4 is going toward D=|0 2 0 
0 4 9 00 1 
(b) Choose £32 to remove the 4 below the second pivot. Then A is reduced to D 


by E32E21 AE], Et, = D. Invert the E’s to find L in A = LDL". 


28 = If every row of a 4 by 4 matrix contains the numbers 0, 1, 2, 3 in some order, can the 
matrix be symmetric? 


29 Prove that no reordering of rows and reordering of columns can transpose a typical 
matrix. (Watch the diagonal entries.) 


The next three questions are about applications of the identity (Ax)! y = x™(ATy). 


30 Wires go between Boston, Chicago, and Seattle. Those cities are at voltages xg, xc, 
xs. With unit resistances between cities, the currents between cities are in y: 


YBC 1 -1 Oj| | xg 
y=Ax is yes |} =|90 1 -l xc 
yes 1 0 -l XS 


(a) Find the total currents AT y out of the three cities. 
(b) Verify that (Ax)! y agrees with xT (AT y)—-six terms in both. 


2.7 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 
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Producing xı trucks and x2 planes needs x, + 50x3 tons of steel, 40x, + 1000x2 
pounds of rubber, and 2x, + 50x2 months of labor. If the unit costs y1, y2, y3 are 
$700 per ton, $3 per pound, and $3000 per month, what are the values of one truck 
and one plane? Those are the components of AT y. 


Ax gives the amounts of steel, rubber, and labor to produce x in Problem 31. Find A. 
Then Ax - y is the of inputs while x - AT y is the value of 


The matrix P that multiplies (x, y,z) to give (z,x, y) is also a rotation matrix. 
Find P and P?. The rotation axis a = (1,1,1) doesn’t move, it equals Pa. 
What is the angle of rotation from v = (2,3, —5) to Pv = (—5, 2, 3)? 


Write A = [12] as the product EH of an elementary row operation matrix E and a 
symmetric matrix H. 


Here is a new factorization of A into triangular (with 1’s) times symmetric: 
Start from 4 = LDU. Then A = L(U*)“ times UT DU. 
Why is L(U")~! triangular? Its diagonal is all 1’s. Why is VU? DU symmetric? 


A group of matrices includes AB and AT! if it includes A and B. “Products and 
inverses stay in the group.” Which of these sets are groups? 
Lower triangular matrices L with 1’s on the diagonal, symmetric matrices S, 
positive matrices M, diagonal invertible matrices D, permutation matrices P, 
matrices with QT = Q-!. Invent two more matrix groups. 


Chailenge Problems 


A square northwest matrix B is zero in the southeast corner, below the antidiagonal 
that connects (1,7) to (n, 1). Will BT and B? be northwest matrices? Will B71 be 
northwest or southeast? What is the shape of BC = northwest times southeast? 
If you take powers of a permutation matrix, why is some P¥ eventually equal to 7? 
Find a5 by 5 permutation P so that the smallest power to equal J is P®. 
(a) Write down any 3 by 3 matrix A. Split A into B + C where B = B' is 
symmetric and C = —C! is anti-symmetric. 
(b) Find formulas for B and C involving A and AT. We want A = B + C with 
B = B! and C = —C?. 
Suppose QT equals O`! (transpose equals inverse, so O'TO = T). 


(a) Show that the columns q1, . . . , qn are unit vectors: ljg; |? = 1. 
(b) Show that every two columns of Q are perpendicular: qiqa = 0. 
(c) Find a 2 by 2 example with first entry gi, = cos 8. 


Chapter 3 


Vector Spaces and Subspaces 


3.1 Spaces of Vectors 


To a newcomer, matrix calculations involve a lot of numbers. To you, they involve vectors. 
The columns of Ax and AB are linear combinations of n vectors—the columns of A. 
This chapter moves from numbers and vectors to a third level of understanding (the highest 
level). Instead of individual columns, we look at “spaces” of vectors. Without seeing vector 
spaces and especially their subspaces, you haven’t understood everything about Ax = b. 

Since this chapter goes a little deeper, it may seem a little harder. That is natural. We 
are looking inside the calculations, to find the mathematics. The author’s job is to make it 
clear. The chapter ends with the “Fundamental Theorem of Linear Algebra”. 

We begin with the most important vector spaces. They are denoted by R!, R?, R3, 
R4, . . .. Each space R” consists of a whole collection of vectors. R° contains all column 
vectors with five components. This is called “5-dimensional space”. 


mt: a Sea say eNO 


The components of vare real numbers, which is the reason for the letter R. A vector whose 
n components are complex numbers lies in the space C”. 


The vector space R? is represented by the usual xy plane. Each vector v in R? has two 
components. The word “space” asks us to think of all those vectors—the whole plane. 
Each vector gives the x and y coordinates of a point in the plane: v = (x, y). 

Similarly the vectors in R? correspond to points (x, y, z) in three-dimensional space. 
The one-dimensional space R! is a line (like the x axis). As before, we print vectors as a 
column between brackets, or along a line using commas and parentheses: 

H isinR?, (1,1,0,1, 1) is in RS, f t is in C2. 
The great thing about linear algebra is that it deals easily with five-dimensional space. 
We don’t draw the vectors, we just need the five numbers (or n numbers). 
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To multiply v by 7, multiply every component by 7. Here 7 is a “scalar”. To add vectors 
in RŽ, add them a component at a time. The two essential vector operations go on inside 
the vector space, and they produce linear combinations: 


We can add any vectors in R” , and we can multiply any vector v by any scalar c. 


“Inside the vector space” means that the result stays in the space. If v is the vector in R* 
with components 1, 0,0, 1, then 2v is the vector in R with components 2,0, 0,2. (In this 
case 2 is the scalar.) A whole series of properties can be verified in R”. The commutative 
law is v + w = w + v; the distributive law is c(v + w) = cv + cw. There is a unique 
“zero vector” satisfying 0 + v = v. Those are three of the eight conditions listed at the 
start of the problem set. 

These eight conditions are required of every vector space. There are vectors other than 
column vectors, and vector spaces other than R”, and all vector spaces have to obey the 
eight reasonable rules. 

A real vector space is a set of “vectors” together with rules for vector addition and for 
multiplication by real numbers. The addition and the multiplication must produce vectors 
that are in the space. And the eight conditions must be satisfied (which is usually no 
problem). Here are three vector spaces other than R”: 


In M the “vectors” are really matrices. In F the vectors are functions. In Z the only addition 
is 0+ 0 = 0. In each case we can add: matrices to matrices, functions to functions, zero 
vector to zero vector. We can multiply a matrix by 4 or a function by 4 or the zero vector 
by 4. The result is still in M or F or Z. The eight conditions are all easily checked. 

The function space F is infinite-dimensional. A smaller function space is P, or Py, 
containing all polynomials ao + a,x + ++- + anx” of degree n. 

The space Z is zero-dimensional (by any reasonable definition of dimension). It is the 
smallest possible vector space. We hesitate to call it R°, which means no components— 
you might think there was no vector. The vector space Z contains exactly one vector (zero). 
No space can do without that zero vector. Each space has its own zero vector—the zero 
matrix, the zero function, the vector (0, 0,0) in R°. 


Subspaces 


At different times, we will ask you to think of matrices and functions as vectors. But at all 
times, the vectors that we need most are ordinary column vectors. They are vectors with 
n components—but maybe not all of the vectors with n components. There are important 
vector spaces inside R”. Those are subspaces of R”. 

Start with the usual three-dimensional space R3. Choose a plane through the origin 
(0,0, 0). That plane is a vector space in its own right. If we add two vectors in the plane, 
their sum is in the plane. If we multiply an in-plane vector by 2 or —5, it is still in the plane. 
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f j = typical vector in M 
c 


$ 
10 
° 
01 smallest vector space 
f i zero vector only 


Figure 3.1: “Four-dimensional” matrix space M. The “zero-dimensional” space Z. 


A plane in three-dimensional space is not R° (even if it looks like R*). The vectors have 
three components and they belong to RÌ. The plane is a vector space inside R?. 


This illustrates one of the most fundamental ideas in linear algebra. The plane going 
through (0, 0, 0) is a subspace of the full vector space RŽ. 


ny, scalar, then 


l O v j+ u w is in nthe subspace 


OE cv is in nthe e subspace. 


In other words, the set of vectors is “closed” under addition v + w and multiplication cv 
(and cw). Those operations leave us in the subspace. We can also subtract, because —w is 
in the subspace and its sum with v is v — w. In short, all linear combinations stay in the 
subspace. 


All these operations follow the rules of the host space, so the eight required conditions 
are automatic. We just have to check the requirements for a subspace, so that we can take 
linear combinations. 


First fact: Every subspace contains the zero vector. The plane in R? has to go through 
(0, 0,0). We mention this separately, for extra emphasis, but it follows directly from rule (ii). 
Choose c = 0, and the rule requires Ov to be in the subspace. 


Planes that don’t contain the origin fail those tests. When v is on such a plane, —v 
and Ov are not on the plane. A plane that misses the origin is not a subspace. 


Lines through the origin are also subspaces. When we multiply by 5, or add two 
vectors on the line, we stay on the line. But the line must go through (0, 0, 0). 
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Another subspace is all of R*. The whole space is a subspace (of itself). Here is a list 
of all the possible subspaces of R3: 


(L) Any line through (0, 0, 0) (R?) The whole space 
(P) Any plane through (0, 0, 0) (Z) The single vector (0, 0, 0) 


If we try to keep only part of a plane or line, the requirements for a subspace don’t hold. 
Look at these examples in R?. 


Example 1 Keep only the vectors (x, y) whose components are positive or zero (this is 
a quarter-plane). The vector (2,3) is included but (—2, —3) is not. So rule (ii) is violated 
when we try to multiply by c = —1. The quarter-piane is not a subspace. 


Example 2 Include also the vectors whose components are both negative. Now we have 
two quarter-planes. Requirement (ii) is satisfied; we can multiply by any c. But rule (i) 
now fails. The sum of v = (2,3) and w = (—3,—2) is (—1, 1), which is outside the 
quarter-planes. Two quarter-planes don’t make a subspace. 

Rules (i) and (ii) involve vector addition v + w and multiplication by scalars like c and 
d. The rules can be combined into a single requirement—the rule for subspaces: 


A subspace containing v and w mu ar combinations cv + dw. 


Example 3 Inside the vector space M of all 2 by 2 matrices, here are two subspaces: 


(U) All upper triangular matrices É A (D) All diagonal matrices É | . 


Add any two matrices in U, and the sum is in U. Add diagonal matrices, and the sum is 
diagonal. In this case D is also a subspace of U! Of course the zero matrix is in these 
subspaces, when a, b, and d all equal zero. 

To find a smaller subspace of diagonal matrices, we could require a = d. The matrices 
are multiples of the identity matrix 7. The sum 27 + 37 is in this subspace, and so is 3 
times 47. The matrices c7 form a “line of matrices” inside M and U and D. 

Is the matrix J a subspace by itself? Certainly not. Only the zero matrix is. Your mind 
will invent more subspaces of 2 by 2 matrices—write them down for Problem 5. 


The Column Space of A 


The most important subspaces are tied directly to a matrix A. We are trying to solve 
Ax = b. If A is not invertible, the system is solvable for some b and not solvable for 
other b. We want to describe the good right sides 6—the vectors that can be written as A 
times some vector x. Those b's form the “column space” of A. 

Remember that Ax is a combination of the columns of A. To get every possible b, we 
use every possible x. So start with the columns of A, and take all their linear combinations. 
This produces the column space of A. It is a vector space made up of column vectors. 


C (A) contains not just the n columns of A, but all their combinations Ax. 
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This column space is crucial to the whole book, and here is why. To solve Ax = b is 
to express b as a combination of the columns. The right side b has to be in the column 
space produced by A on the left side, or no solution! 


When b is in the column space, it is a combination of the columns. The coefficients in 
that combination give us a solution x to the system Ax = b. 

Suppose A is an m by n matrix. Its columns have m components (not n). So the 
columns belong to R”, The column space of A is a subspace of R™ (not R” ). The set 
of all column combinations Ax satisfies rules (D and (ii) for a subspace: When we add 
linear combinations or multiply by scalars, we still produce combinations of the columns. 
The word “subspace” is justified by taking all linear combinations. 

Here is a 3 by 2 matrix A, whose column space is a subspace of R*. The column space 
of A is a plane in Figure 3.2. 


Example 4 


l 0 
[x] whichis xıl 4| +x2]3 
2 3 


1 0 
A={4 3 
2 3 

1 0 

b=x1 14) +x2 | 3 

2 3 


Plane = C(A) = all vectors Ax 


Figure 3.2: The column space C (A) is a plane containing the two columns. Ax = b is 
solvable when b is on that plane. Then $ is a combination of the columns. 
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The column space of all combinations of the two columns fills up a plane in R?. 
We drew one particular b (a combination of the columns). This b = Ax lies on the plane. 
The plane has zero thickness, so most right sides b in R° are not in the column space. For 
most 8 there is no solution to our 3 equations in 2 unknowns. 

Of course (0, 0,0) is in the column space. The plane passes through the origin. There 
is certainly a solution to Ax = 0. That solution, always available, is x = 

To repeat, the attainable right sides b are exactly the vectors in the column space. One 
possibility is the first column itself—take x; = 1 and xz = 0. Another combination is the 
second column—take x; = 0 and x2 = 1. The new level of understanding is to see ail 
combinations—the whole subspace is generated by those two columns. 


Notation The column space of A is denoted by C (A). Start with the columns and take all 
their linear combinations. We might get the whole R” or only a subspace. 


important Instead of columns in R”, we could start with any set S of vectors in a vector 
space V. To get a subspace SS of V, we take all combinations of the vectors in that set: 


S = setof vectors in V (probably not a subspace) 
SS 


Il 


all combinations of vectors in S 


ee E s = all w: 4. + c Ne, N = the subspace of v “spanned” by s 


When S is the set of columns, SS is the column space. When there is only one nonzero 
vector v in S, the subspace SS is the line through v. Always SS is the smallest subspace 
containing S. This is a fundamental way to create subspaces and we will come back to it. 


The subspace SS is the “span” of S, containing all combinations of vectors in S. 


Example 5 Describe the column spaces (they are subspaces of R?) for 


1 0 1 2 1 2 3 
r= 1 and a=; ‘| and B= 0 A 


Solution The column space of J is the whole space R*. Every vector is a combination of 
the columns of 7. In vector space language, C(I) is R. 

The column space of A is only a line. The second column (2, 4) is a multiple of the first 
column (1,2). Those vectors are different, but our eye is on vector spaces. The column 
space contains (1,2) and (2, 4) and all other vectors (c, 2c) along that line. The equation 
Ax = b is only solvable when b is on the line. 

For the third matrix (with three columns) the column space C (B) is all of R?. Every 
b is attainable. The vector b = (5,4) is column 2 plus column 3, so x can be (0, 1, 1). 
The same vector (5, 4) is also 2(column 1) + column 3, so another possible x is (2,0, 1). 
This matrix has the same column space as J—any b is allowed. But now x has extra 
components and there are more solutions—more combinations that give b. 


The next section creates a vector space N(A), to describe all the solutions of Ax = 0. 
This section created the column space C (A), to describe all the attainable right sides b. 
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a REVIEW OF THE KEY IDEAS & 


1. R” contains all column vectors with n real components. 

2. M (2 by 2 matrices) and F (functions) and Z (zero vector alone) are vector spaces. 
3. A subspace containing v and w must contain all their combinations cv + dw. 
4 


. The combinations of the columns of A form the column space C(A). Then the 
column space is “spanned” by the columns. 


5. Ax = b has a solution exactly when b is in the column space of A. 


= WORKED EXAMPLES = 


3.1 A We are given three different vectors b1, b2, b3. Construct a matrix so that the 
equations Ax = bı and Ax = bz are solvable, but Ax = b3 is not solvable. How can you 
decide if this is possible? How could you construct A? 


Solution We want to have b; and b2 in the column space of A. Then Ax = b, and 
Ax = bz will be solvable. The quickest way is to make bı and b2 the two columns of A. 
Then the solutions are x = (1,0) and x = (0, 1). 

Also, we don’t want Ax = b; to be solvable. So don’t make the column space any 
larger! Keeping only the columns of b, and b2, the question is: 


Is Ax =| bi b2 l 7 | = b3 solvable? Is b3 a combination of b; and b2? 
2 


If the answer is no, we have the desired matrix A. If the answer is yes, then it is not possible 
to construct A. When the column space contains b; and bo, it will have to contain all their 
linear combinations. So b3 would necessarily be in that column space and Ax = b, would 
necessarily be solvable. 


3.1B Describe a subspace S of each vector space V, and then a subspace SS of S. 


x 
| 


= all combinations of (1, 1,0, 0) and (1, 1, 1,0) and (1, 1,1, 1) 
V2 = all vectors perpendicular to u = (1,2,1), sou-v =0 

V3 = all symmetric 2 by 2 matrices (a subspace of M) 

V4 = all solutions to the equation d* y /dx* = 0 (a subspace of F) 


Describe each V two ways: All combinations of ...., all solutions of the equations .... 
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Solution V; starts with three vectors. A subspace S comes from all combinations of the 
first two vectors (1, 1,0,0) and (1, 1, 1,6). A subspace SS of S comes from all multiples 
(c,c, 0,0) of the first vector. So many possibilities. 

A subspace S of Va is the line through (1, —1, 1). This line is perpendicular to u. The 
vector x = (0,0, 0) is in S and all its multiples cx give the smallest subspace SS = Z. 

The diagonal matrices are a subspace S of the symmetric matrices. The multiples c7 
are a subspace SS of the diagonal matrices. 

V, contains all cubic polynomials y = a + bx + cx? + dx?, with d*y/dx* = 0. 
The quadratic polynomials give a subspace S. The linear polynomials are one choice of 
SS. The constants could be SSS. 

In all four parts we could take S = V itself, and SS = the zero subspace Z. 

Each V can be described as all combinations of .... and as all solutions of ....: 


Vi = all combinations of the 3 vectors V, = all solutions of vı — v2 = 0 
Va = all combinations of (1,0, —1) and (1, ~1, 1) are solutions of u- v = 0. 

V3 = all combinations of [49], [94], [89]. V3 = all solutions [28] of b = c 
V4 = all combinations of 1, x, x?,x? V4 = all solutions to d*y/dx* = 0. 


Problem Set 3.1 


The first problems 1-8 are about vector spaces in general. The vectors in those spaces 
are not necessarily column vectors, In the definition of a vector space, vector addition 
x + y and scalar multiplication cx must obey the following eight rules: 


GQ) x+y=ytx 

(2) x+ (y +z)= (œ +y)+z 

(3) There is a unique “zero vector” such that x + 0 = x for all x 
(4) For each x there is a unique vector —x such that x + (—x) = 0 
(5) 1 times x equals x 

(6) (cic2)¥ = cı (Cox) 

(7) c(x+y)=cx+cy 

(8) (c1 + C2)¥ = cix + 2x. 


1 Suppose (x1, x2) + (y1, y2) is defined to be (xı + y2,x2 + yi). With the usual 
multiplication cx = (cx1,¢x2), which of the eight conditions are not satisfied? 


2 Suppose the multiplication cx is defined to produce (cx,,0) instead of (cx, cx2). 
With the usual addition in R?, are the eight conditions satisfied? 
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(a) Which rules are broken if we keep only the positive numbers x > 0 in R!? 
Every c must be allowed. The half-line is not a subspace. 
(b) The positive numbers with x + y and cx redefined to equal the usual xy and 
l x° do satisfy the eight rules. Test rule 7 when c = 3,x = 2, y = 1. (Then 
x + y =2andcx = 8.) Which number acts as the “zero vector”? 


The matrix A = [2-2] is a “vector” in the space M of all 2 by 2 matrices. Write 
down the zero vector in this space, the vector ŁA, and the vector —A. What matrices 
are in the smallest subspace containing A? 

(a) Describe a subspace of M that contains A = [4 $ ] but not B = [8 _9]. 

(b) If a subspace of M contains A and B, must it contain 7? 

(c) Describe a subspace of M that contains no nonzero diagonal matrices. 
The functions f(x) = x? and g(x) = 5x are “vectors” in F. This is the vector 


space of all real functions. (The functions are defined for —oo < x < œo.) The 
combination 3 f (x) — 4g (x) is the function A(x) = 


Which rule is broken if multiplying f (x) by c gives the function f(cx)? Keep the 
usual addition f (x) + g(x). 


If the sum of the “vectors” f (x) and g(x) is defined to be the function f(g(x)), 
then the “zero vector” is g(x) = x. Keep the usual scalar multiplication c f (x) and 
find two rules that are broken. 


Questions 9-18 are about the “subspace requirements”: x + y and cx (and then all 
linear combinations cx + dy) stay in the subspace. 


9 


10 


11 


One requirement can be met while the other fails. Show this by finding 


(a) A set of vectors in R? for which x + y stays in the set but iy may be outside. 
(b) A set of vectors in R? (other than two quarter-planes) for which every cx stays 
in the set but x + y may be outside. 
Which of the following subsets of R? are actually subspaces? 


(a) The plane of vectors (bı, b2, b3) with by = bo. 

(b) The plane of vectors with b; = 1. 

(c) The vectors with b,b2b3 = 0. 

(d) AIl linear combinations of v = (1, 4,0) and w = (2, 2, 2). 
(e) All vectors that satisfy bı + b2 + b3 = 0. 

(© All vectors with bı < bz < b3. 


Describe the smallest subspace of the matrix space M that contains 


o [o ofe[o o] œ foa] [0 oelo 2} 
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12 


13 


14 


15 


16 


17 


18 


Let P be the plane in R? with equation x + y — 2z = 4. The origin (0, 0, 0) is not 
in P! Find two vectors in P and check that their sum is notin P. 


Let Po be the plane through (0, 0, 0) parallel to the previous plane P. What is the 
equation for Po? Find two vectors in Po and check that their sum is in Po. 


The subspaces of R? are planes, lines, R3 itself, or Z containing only (0, 0, 0). 


(a) Describe the three types of subspaces of R?. 
(b) Describe all subspaces of D, the space of 2 by 2 diagonal matrices. 


(a) The intersection of two planes through (0, 0, 0) is probably a but it could 
bea . It can’t be Z! 


(b) The intersection of a plane through (0, 0, 0) with a line through (0, 0, 0) is prob- 
ably a but it could be a . 


(c) If S and T are subspaces of R°, prove that their intersection S N T is a 
subspace of RŽ. Here S N T consists of the vectors that lie in both subspaces. 
Check the requirements on x + y and cx. 


Suppose P is a plane through (0,0, 0) and L is a line through (0, 0, 0). The smallest 
vector space containing both P and L is either or 


(a) Show that the set of invertible matrices in M is not a subspace. 


(b) Show that the set of singular matrices in M is not a subspace. 
True or false (check addition in each case by an example): 


(a) The symmetric matrices in M (with AT = A) form a subspace. 
(b) The skew-symmetric matrices in M (with AT = —A) form a subspace. 


(c) The unsymmetric matrices in M (with AT Æ A) form a subspace. 


Questions 19-27 are about column spaces C (A) and the equation Ax = b. 


19 


20 


Describe the column spaces (lines or planes) of these particular matrices: 


1 2 1 0 1 0 
A=]|0 0 and B=|0 2 and C= {2 0 
0 0 0 0 0 0 


For which right sides (find a condition on b1, b2, b3) are these systems solvable? 


l 4 2 Xi bi I 4 x by 
(fz) 12 8 4]/j/x2/=|b (b) 2 9 [x] - b 
—} -4 -2]]| x; b3 —1 -4 2 bs 
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21 


22 


23 


24 


25 


26 


27 


28 


29 
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Adding row 1 of A to row 2 produces B. Adding column 1 to column 2 produces C. 
A combination of the columns of (B or C 7?) is also a combination of the columns of 
A. Which two matrices have the same column ? 


1 2 1 2 1 3 
a=|) ‘| and B=; A and c=[, a 


For which vectors (b1, b2, b3) do these systems have a solution? 


1 1 1 Xi by 1 1 1 x] by 
01 1 x2 | = | b2 and O 1 1 x | = | dbo 
0 0 1 X3 b3 0 0 0 X3 b3 

1 1 1 Xi bı 

and 00 1 x2 | = | b2 

0 0 X3 b3 


(Recommended) If we add an extra column b to a matrix A, then the column space 
gets larger unless . Give an example where the column space gets larger and 
an example where it doesn’t. Why is Ax = b solvable exactly when the column 
space doesn’t get larger—it is the same for A and[ A b}? 


The columns of AB are combinations of the columns of A. This means: The column 
space of AB is contained in (possibly equal to) the column space of A. Give an 
example where the column spaces of A and AB are not equal. 


Suppose Ax = b and Ay = b* are both solvable. Then Az = b + b” is solvable. 
What is z? This translates into: If b and b* are in the column space C (A), then 
b+ b* isin C(A). 


If A is any 5 by 5 invertible matrix, then its column space is . Why? 
True or false (with a counterexample if false): 


(a) The vectors 6 that are not in the column space C (A) form a subspace. 
(b) If C(A) contains only the zero vector, then A is the zero matrix. 

(c) The column space of 2A equals the column space of A. 

(d) The column space of A — J equals the column space of A (test this). 


Construct a 3 by 3 matrix whose column space contains (1, 1, 0) and (1, 0, 1) but not 
(1, 1, 1). Construct a 3 by 3 matrix whose column space is only a line. 


If the 9 by 12 system Ax = b is solvable for every b, then C (A) = 
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30 


31 


32 


Challenge Problems 


Suppose S and T are two subspaces of a vector space V. 


(a) Definition: The sum S + T contains all sums s + t of a vector s in S anda 
vector £ in T. Show that S + T satisfies the requirements (addition and scalar 
multiplication) for a vector space. 


(b) If S and T are lines in R”, what is the difference between S + T and S U T? 
That union contains all vectors from S or T or both. Explain this statement: 
The span of SUT is S + T. (Section 3.5 returns to this word “span’’.) 


If S is the column space of A and T is C (B), then S + T is the column space of what 
matrix M? The columns of A and B and M are all in R”. (I don’t think A + B is 
always a correct M.) 


Show that the matrices A and | A AB | (with extra columns) have the same column 
space. But find a square matrix with C (4?) smaller than C (A). Important point: 


Ann by n matrix has C (A) = R” exactly when A is an matrix. 
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3.2 The Nullspace of A: Solving Ax = 0 


This section is about the subspace containing all solutions to Ax = 0. The m by n matrix 
A can be square or rectangular. One immediate solution is x = 0. For invertible matrices 
this is the only solution. For other matrices, not invertible, there are nonzero solutions to 
Ax = 0. Each solution x belongs to the nullspace of A. 

Elimination will find all solutions and identify this very important subspace. 


> The nullspace of A consists of all solutions 1 to Ax = 0. 
‘The e nüllópâcë containing all solutions. of Ax i 


Check that the solution vectors form a subspace. Suppose x and y are in the nullspace (this 
means Ax = Q and Ay = 0). The rules of matrix multiplication give A(x + y) = 0 + 0. 
The rules also give A(cx) = c0. The right sides are still zero. Therefore x + y and cx are 
also in the nullspace N (A). Since we can add and multiply without leaving the nullspace, 
it is a subspace. 

To repeat: The solution vectors x have n components. They are vectors in R”, so the 
nullspace is a subspace of R”. The column space C (A) is a subspace of R”. 

If the right side b is not zero, the solutions of Ax = b do not form a subspace. The 
vector x = 0 is only a solution if b = 0. When the set of solutions does not include x = 0, 
it cannot be a subspace. Section 3.4 will show how the solutions to Ax = b (if there are 
any solutions) are shifted away from the origin by one particular solution. 


Example 1 x -+2y + 3z = 0 comes from the 1 by 3 matrix A = [1 2 3]. This 
equation Ax = 0 produces a plane through the origin (0, 0, 0). The plane is a subspace of 
R3. It is the nullspace of A. 

The solutions to x + 2y + 3z = 6 also form a plane, but not a subspace. 


; A! This matrix is singular ! 


Solution Apply elimination to the linear equations Ax = 0: 


Example 2 Describe the nullspace of A = | 


, X1 +2x2=0 a xı + 2x, =0 
3x; + 6x2 =0 0=0 


There is really only one equation. The second equation is the first equation multiplied by 
3. In the row picture, the line x; + 2x2 = 0 is the same as the line 3x; + 6x2 = 0. That 
line is the nullspace N (A). It contains all solutions (x1, x2). 


To describe this line of solutions, here is an efficient way. Choose one point on the line 
(one “special solution”). Then all points on the line are multiples of this one. We choose 
the second component to be x2 = 1 (a special choice). From the equation x, + 2x2 = 0, 
the first component must be x; = —2. The special solution s is (—2, 1): 


Special 
solution 
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This is the best way to describe the nullspace, by computing special solutions to Ax = 0. 
This example has one special solution and the nullspace is a line. 

The nullspace consists of all combinations of the special solutions. 


The plane x + 2y + 3z = 0 in Example 1 had two special solutions: 


x —2 —3 
[1 2 3]| y | =Ohas the special solutions sı = | 1| andsz=| 0 
z 0 1 


Those vectors s; and s2 lie on the plane x + 2y + 3z = 0, which is the nullspace of 
A= [|1 2 3]. All vectors on the plane are combinations of sı and s2. 

Notice what is special about sı and s2. They have ones and zeros in the last two 
components. Those components are “free” and we choose them specially. Then the first 
components —2 and —3 are determined by the equation Ax = 0. 

The first column of A = [ 1 2 3 | contains the pivot, so the first component of x is 
not free. The free components correspond to columns without pivots. This description of 
special solutions will be completed after one more example. 

The special choice (one or zero) is only for the free variables. 


Example 3 Describe the nullspaces of these three matrices A, B, C: 


1 2 
12 A]_|3 8 122 4 
‘=|; 4 s=[,4]- 2 4| ©=l4 24] =|; 8 6 ma 
6 16 


Solution The equation Ax = 0 has only the zero solution x = 0. The nullspace is Z. 
It contains only the single point x = 0 in R?. This comes from elimination: 

0 

ot: 


1 2 Xi 0 . 1 2 X1 0 X41 
[> a] ]=[o) [o [l= [o] me 
A is invertible. There are no special solutions, All columns of this A have pivots. 

The rectangular matrix B has the same nullspace Z. The first two equations in Bx = 0 
again require x = 0. The last two equations would also force x = 0. When we add 
extra equations, the nullspace certainly cannot become larger. The extra rows impose more 
conditions on the vectors x in the nullspace. 

The rectangular matrix C is different. It has extra columns instead of extra rows. The 


solution vector x has four components. Elimination will produce pivots in the first two 
columns of C, but the last two columns are “free”. They don’t have pivots: 


122 4 122 4 
C= E g 6 | becomes U = l 2 0 | 
tf ttt. 


free columns. 


For the free variables x3 and x4, we make special choices of ones and zeros. First x3 = 1, 
x4 = 0 and second x3 = 0, x4 = 1. The pivot variables x; and x2 are determined by the 
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equation Ux = 0. We get two special solutions in the nullspace of C (which is also the 
nullspace of U). The special solutions are s; and s2: 


—2 0| < pivot 
0 —2 <— variables 

si=| 7] andsz=| o| < free 
0 1 < variables 


One more comment to anticipate what is coming soon. Elimination will not stop at the 
upper triangular U! We can continue to make this matrix simpler, in two ways: 


t: Produce zeros above the pivots 


2... Produce ones in the pivots, 


Those steps don’t change the zero vector on the right side of the equation. The nullspace 
stays the same. This nullspace becomes easiest to see when we reach the reduced row 
echelon form R. It has J in the pivot columns: 


Reduced 
form R Pla ive he Ss te 
`> now the pivot columns contain J ©. ; E 


I subtracted row 2 of U from row 1, and then multiplied row 2 by 4. The original two 
equations have simplified to x; + 2x3 = O and x2 + 2x4 = 0. 

The first special solution is still sı = (—2, 0, 1,0), and sz is also unchanged. Special 
solutions are much easier to find from the reduced system Rx = 0. 

Before moving to m by n matrices A and their nullspaces N (A) and special solutions, 
allow me to repeat one comment. For many matrices, the only solution to Ax = Qis x = 0. 
Their nullspaces N(A) = Z contain only that zero vector. The only combination of the 
columns that produces b = © is then the “zero combination” or “trivial combination”. 
The solution is trivial (just x = 0) but the idea is not trivial. 

This case of a zero nullspace Z is of the greatest importance. It says that the columns 
of A are independent. No combination of columns gives the zero vector (except the zero 
combination). All columns have pivots, and no columns are free. You will see this idea of 
independence again ... 


Solving Ax = 0 by Elimination 


This is important. A is rectangular and we still use elimination. We solve m equations in 
n unknowns when b = 0. After A is simplified by row operations, we read off the solution 
(or solutions). Remember the two stages (forward and back) in solving Ax = 0: 
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1. Forward elimination takes A to a triangular U (or its reduced form R). 


2. Back substitution in Ux = 0 or Rx = 0 produces x. 


You will notice a difference in back substitution, when A and U have fewer than n 
pivots. We are allowing all matrices in this chapter, not just the nice ones (which are 
square matrices with inverses). 

Pivots are still nonzero. The columns below the pivots are still zero. But it might 
happen that a column has no pivot. That free column doesn’t stop the calculation. Go on 
to the next column. The first example is a 3 by 4 matrix with two pivots: 


1 1 2 3 
A=|2 2 8 10 
3 3 10 13 


1 12 3 
A>|0 0 4 4 (subtract 2 x row 1) 
0 0 4 4 (subtract 3 x row 1) 


The second column has a zero in the pivot position. We look below the zero for a nonzero 
entry, ready to do a row exchange. The entry below that position is also zero. Elimination 
can do nothing with the second column. This signals trouble, which we expect anyway for 
a rectangular matrix. There is no reason to quit, and we go on to the third column. 

The second pivot is 4 (but it is in the third column). Subtracting row 2 from row 3 clears 
out that column below the pivot. The pivot columns are | and 3: 


Triangular U : 


The fourth column also has a zero in the pivot position—but nothing can be done. There 
is no row below it to exchange, and forward elimination is complete. The matrix has three 
rows, four columns, and only two pivots. The original Ax = 0 seemed to involve three 
different equations, but the third equation is the sum of the first two. It is automatically 
satisfied (0 = 0) when the first two equations are satisfied. Elimination reveals the inner 
truth about a system of equations. Soon we push on from U to R. 

Now comes back substitution, to find all solutions to Ux = 0. With four unknowns 
and only two pivots, there are many solutions. The question is how to write them all down. 
A good method is to separate the pivot variables from the free variables. 


: The pivot variables are xı and x3. 


' The free variables are x2 and x4. 


Siete ee A 
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The free variables xz and x4 can be given any values whatsoever. Then back substitution 
finds the pivot variables xı and x3. (In Chapter 2 no variables were free. When A is 
invertible, all variables are pivot variables.) The simplest choices for the free variables are 
ones and zeros. Those choices give the special solutions. 


Special solutions to x1 + x2 + 2x3 + 3x4 = 0 and 4x3 + 4x4 = 0 
è Set x2 = Í and x4 = 0. By back substitution x3 = 0. Then x; = —1. 
e Set x2 = 0 and x4 = 1. By back substitution x3 = —1. Then x; = —1. 


These special solutions solve Ux = 0 and therefore Ax = 0. They are in the nullspace. 
The good thing is that every solution is a combination of the special solutions. 


P 4 g os 


x 1 +x 0 
: 0 1 
: s special special a “nco npl 


Please look again at that answer. It is the main goal of this section. The vector s1 = 
(—1, 1,0,0) is the special solution when x = 1 and x4 = 0. The second special solution 
has x2 = 0 and x4 = 1. All solutions are linear combinations of sı and s2. The special 
solutions are in the nullspace N (A), and their combinations fill out the whole nullspace. 

The MATLAB code nullbasis computes these special solutions. They go into the columns 
of a nulispace matrix N. The complete solution to Ax = 0 is a combination of those 
columns. Once we have the special solutions, we have the whole nullspace. 

There is a special solution for each free variable. If no variables are free—this means 
there are n pivots—then the only solution to Ux = 0 and Ax = 0 is the trivial solution 
x = 0. All variables are pivot variables. In that case the nullspaces of A and U contain 
only the zero vector. With no free variables, and pivots in every column, the output from 
nullbasis is an empty matrix. The nullspace with n pivots is Z. 


Example 4 Find the nulispace of U = k ° ; . 
The second column of U has no pivot. So x2 is free. The special solution has x2 = 1. Back 
substitution into 9x3 = 0 gives x3 = 0. Then xı + 5x2 = Oor x, = —5. The solutions to 
Ux = 0 are multiples of one special solution: 
—5 The nullspace of U is a line in R°. 
x=x | 1 It contains multiples of the special solution s = (—5, 1,0). 
0 One variable is free, and N = nullbasis (U) has one column s. 


In a minute elimination will get zeros above the pivots and ones in the pivots. 
By continuing elimination on U, the 7 is removed and the pivot changes from 9 to 1. 
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The final result will be the reduced row echelon form R: 


15 7 15 0 
v=; 0 | reduces to R = | j 0 1 | = ret. 


This makes it even clearer that the special solution (column of N) is s = (—5, 1, 0). 


Echelon Matrices 


Forward elimination goes from A to U. It acts by row operations, including row exchanges. 
It goes on to the next column when no pivot is available in the current column. The m by n 
“staircase” U is an echelon matrix. 

Here is a 4 by 7 echelon matrix with the three pivots p highlighted in boldface: 


Question What are the column space and the nullspace for this matrix? 


Answer The columns have four components so they lie in R*. (Not in RŽ!) The fourth 
component of every column is zero. Every combination of the columns—every vector 
in the column space—has fourth component zero. The column space C(U) consists of 
all vectors of the form (bı, b2, 63,0). For those vectors we can solve Ux = b by back 
substitution. These vectors b are all possible combinations of the seven columns. 


The nullspace N (U) is a subspace of R’. The solutions to Ux = 0 are all the combi- 
nations of the four special solutions—one for each free variable: 


1. Columns 3, 4, 5, 7 have no pivots. So the free variables are x3, X4, X5, X7- 
2. Set one free variable to 1 and set the other free variables to zero. 
3. Solve Ux = 0 for the pivot variables x1, x2, x6. 


4. This gives one of the four special solutions in the nullspace matrix N. 


The nonzero rows of an echelon matrix go down in a staircase pattern. The pivots are 
the first nonzero entries in those rows. There is a column of zeros below every pivot. 

Counting the pivots leads to an extremely important theorem. Suppose A has more 
columns than rows. With n > m there is at least one free variable. The system Ax = 0 
has at least one special solution. This solution is not zero! 
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A short wide matrix (n > m) always has nonzero vectors in its nullspace. There must be at 
least n — m free variables, since the number of pivots cannot exceed m. (The matrix only 
has m rows, and a row never has two pivots.) Of course a row might have no pivot—which 
means an extra free variable. But here is the point: When there is a free variable, it can be 
set to 1. Then the equation Ax = 0 has a nonzero solution. 

To repeat: There are at most m pivots. With n > m, the system Ax = 0 has a 
nonzero solution. Actually there are infinitely many solutions, since any multiple cx is 
also a solution. The nullspace contains at least a line of solutions. With two free variables, 
there are two special solutions and the nullspace is even larger. 

The nullspace is a subspace. Its “dimension” is the number of free variables. This 
central idea—the dimension of a subspace—is defined and explained in this chapter. 


The Reduced Row Echelon Matrix R 


From an echelon matrix U we go one more step. Continue with a 3 by 4 example: 


1 1 2 3 
U=|0 0 4 4 
00 0 0 


We can divide the second row by 4. Then both pivots equal 1. We can subtract 2 times this 
new row [0 011 ] from the row above. The reduced row echelon matrix R has zeros 
above the pivots as well as below: 


Pivot rows 
contain I 


Reduced row 
echelon matrix 


R has 1’s as pivots. Zeros above pivots come from upward elimination. 


Important Zf A is invertible, its reduced row echelon form is the identity matrix R = I. 
This is the ultimate in row reduction. Of course the nullspace is then Z. 

The zeros in R make it easy to find the special solutions (the same as before): 

1. Set x2 = l and x4 = 0. Solve Rx = 0. Then xı = —1 and x3 = 0. 
Those numbers —1 and0 are sitting in column 2 of R (with plus signs). 


2. Set x2 = Oand x4 = 1. Solve Rx = 0. Then x; = —1 and x3 = —1. 
Those numbers —1 and —1 are sitting in column 4 (with plus signs). 


By reversing signs we can read off the special solutions directly from R. The nullspace 
N(A) = N(U) = N(R) contains all combinations of the special solutions: 


—1 —] 
X=X2 k + x4 > = (complete solution of Ax = 0). 
0 I 


The next section of the book moves firmly from U to the row reduced form R. The 
MATLAB command [ R, pivcol | = rref(A) produces R and also a list of the pivot columns. 
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= REVIEW OF THE KEY IDEAS = 


1. The nullspace N (A) is a subspace of R”. It contains all solutions to Ax = 0. 


2. Elimination produces an echelon matrix U, and then a row reduced R, with pivot 
columns and free columns. 


3. Every free column of U or R leads to a special solution. The free variable equals 1 
and the other free variables equal 0. Back substitution solves Ax = 0. 


4. The complete solution to Ax = 0 is a combination of the special solutions. 


5. Ifan > m then A has at least one column without pivots, giving a special solution. So 
there are nonzero vectors x in the nullspace of this rectangular A. 


= WORKED EXAMPLES = 


3.2A Create a 3 by 4 matrix whose special solutions to Ax = 0 are sy and s2: 


s, = and s7= 


—2 

0 pivot columns | and 3 
—6 free variables x2 and x4 
l 


—3 
1 
0 
0 
You could create the matrix A in row reduced form R. Then describe all possible matrices 
A with the required nullspace N (A) = all combinations of sı and s2. 


Solution The reduced matrix R has pivots = 1 in columns 1 and 3. There is no third 
pivot, so the third row of R is all zeros. The free columns 2 and 4 will be combinations of 
the pivot columns: 


13 0 2 
R=| 0 0 1 6 has Rs,;=0 and Rs: =Q. 
0000 


The entries 3, 2, 6 in R are the negatives of —3, —2, —6 in the special solutions! 

R is only one matrix (one possible A) with the required nullspace. We could do any 
elementary operations on R—exchange rows, multiply a row by any c # 0, subtract any 
multiple of one row from another. R can be multiplied (on the left) by any invertible 
matrix, without changing its nullspace. 

Every 3 by 4 matrix has at least one special solution. These matrices have two. 


140 Chapter 3. Vector Spaces and Subspaces 


3.2B Find the special solutions and describe the complete solution to Ax = 0 for 


00 0 0 3 6 
“=| o 000) a= 5] As=[ 4 4] 
Which are the pivot columns? Which are the free variables? What is R in each case? 


Solution A,x = 0 has four special solutions. They are the columns $1, $2, 53,54 of the 
4 by 4 identity matrix. The nullspace is all of R. The complete solution to Ax = 0 is 
any x = c1S1 + C252 +353 + c484 in R4. There are no pivot columns; all variables are 
free; the reduced R is the same zero matrix as Aj. 

Ax = 0 has only one special solution s = (—2,1). The multiples x = cs give the 
complete solution. The first column of A» is its pivot column, and x2 is the free variable. 
The row reduced matrices Rz for Az and R3 for A3 = [Az Az] have 1’s in the pivot: 


3 6 1 2 1212 
=| | [> R=| 6 > | [4 do J+ R=| 6 0 0 a | 


Notice that R3 has only one pivot column (the first column). All the variables x2, x3, x4 
are free. There are three special solutions to A3 x = 0 (and also R3 x = 0): 


sı =(—2, 1,0,0) s.=(—1,0,1,0) s3=(—2,0,0, 1) Complete x =c151 +c252 +0353. 


With r pivots, A has n — r free variables. Ax = 0 has n — r special solutions. 


Problem Set 3.2 
Questions 1—4 and 5-8 are about the matrices in Problems 1 and 5. 
1 Reduce these matrices to their ordinary echelon forms U: 
122 4 6 2 4 2 
(a) A=} 123 6 9 (b) B= |0 4 4 
00 12 3 0 8 8 


Which are the free variables and which are the pivot variables? 


2 For the matrices in Problem 1, find a special solution for each free variable. (Set the 
free variable to 1. Set the other free variables to zero.) 


3 By combining the special solutions in Problem 2, describe every solution to Ax = 0 
and Bx = 0. The nullspace contains only x = 0 when there are no 


4 By further row operations on each U in Problem 1, find the reduced echelon form R. 
True or false: The nullspace of R equals the nullspace of U. 


5 By row operations reduce each matrix to its echelon form U. Write down a 2 by 2 
lower triangular L such that B = LU. 
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-1 3 5 -1 3 5 
a= 6 10| ©) B=[2 6 al 


6 For the same A and B, find the special solutions to Ax =0 and Bx =0. For an m by 
n matrix, the number of pivot variables plus the number of free variables is 


7 In Problem 5, describe the nullspaces of A and B in two ways. Give the equations 
for the plane or the line, and give all vectors x that satisfy those equations as combi- 
nations of the special solutions. 


8 Reduce the echelon forms U in Problem 5 to R. For each R draw a box around the 
identity matrix that is in the pivot rows and pivot columns. 


Questions 9-17 are about free variables and pivot variables. 
9 True or false (with reason if true or example to show it is false): 


(a) A square matrix has no free variables. 
(b) An invertible matrix has no free variables. 
(c) An m by n matrix has no more than n pivot variables. 


(d) An m by n matrix has no more than m pivot variables. 
10 Construct 3 by 3 matrices A to satisfy these requirements (if possible): 


(a) A has no zero entries but U = I. 
(b) A has no zero entries but R = I. 
(c) A has no zero entries but R = U. 
(dd A=U =2R. 
11 Put as many 1’s as possible in a 4 by 7 echelon matrix U whose pivot columns are 
(a) 2,4,5 
(b) 1,3, 6,7 
(c) 4 and 6. 


12 Putas many 1’s as possible in a 4 by 8 reduced echelon matrix R so that the free 
columns are 


(a) 2,4,5,6 
(b) 1, 3, 6,7, 8. 


13 Suppose column 4 of a 3 by 5 matrix is all zero. Then x4 is certainly a 
variable. The special solution for this variable is the vector x = 


14 Suppose the first and last columns of a 3 by 5 matrix are the same (not zero). Then 
is a free variable. Find the special solution for this variable. 
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15 


16 


17 


18 


19 


20 
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Suppose an m by n matrix has r pivots. The number of special solutions is 
The nullspace contains only x = 0 whenr = . The column space is all of 
R” when r = 


The nullspace of a 5 by 5 matrix contains only x = 0 when the matrix has 
pivots. The column space is R? when there are pivots. Explain why. 


The equation x — 3y — z = 0 determines a plane in R?. What is the matrix A in 
this equation? Which are the free variables? The special solutions are (3, 1,0) and 


(Recommended) The plane x —3y —z = 12 is parallel to the plane x —3y — z = Oin 
Problem 17. One particular point on this plane is (12, 0,0). All points on the plane 
have the form (fill in the first components) 


NM k 


=10;+y/1]4+z]|0 
0 0 l 


Prove that U and A = LU have the same nullspace when L is invertible: 


If Ux = 0 then LUx =0. If LUx = 0, how do you know Ux = 0? 


Suppose column 1 + column 3 + column 5 = 0 in a 4 by 5 matrix with four pivots. 
Which column is sure to have no pivot (and which variable is free)? What is the 
special solution? What is the nullspace? 


Questions 21-28 ask for matrices (if possible) with specific properties. 


21 


22 
23 


24 


25 


26 
27 
28 


Construct a matrix whose nullspace consists of all combinations of (2,2, 1,0) and 
(3, 1,0, 1). 


Construct a matrix whose nullspace consists of all multiples of (4, 3,2, 1). 


Construct a matrix whose column space contains (1, 1,5) and (0,3, 1) and whose 
nullspace contains (1, 1, 2). 


Construct a matrix whose column space contains (1, 1,0) and (0, 1, 1) and whose 
nullspace contains (1, 0, 1) and (0, 0, 1). 


Construct a matrix whose column space contains (1, 1, 1) and whose nullspace is the 
line of multiples of (1, 1, 1, 1). 


Construct a 2 by 2 matrix whose nullspace equals its column space. This is possible. 
Why does no 3 by 3 matrix have a nullspace that equals its column space? 


If AB = 0 then the column space of B is contained in the of A. Give an 
example of A and B. 
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29 


30 


31 


32 


33 


34 


35 


36 


37 


The reduced form R of a 3 by 3 matrix with randomly chosen entries is almost sure 
to be . What R is virtually certain if the random A is 4 by 3? 


Show by example that these three statements are generally false: 


(a) A and AT have the same nullspace. 
(b) A and AT have the same free variables. 
(c) If R is the reduced form rref(A) then RT is rref(A‘). 


If the nullspace of A consists of all multiples of x = (2, 1,0, 1), how many pivots 
appear in U? What is R? 


If the special solutions to Rx = 0 are in the columns of these N, go backward to 
find the nonzero rows of the reduced matrices R: 


2 3 0 
N=]1 0 and N=J|0 and N= (empty 3 by 1). 
0 1 1 


(a) What are the five 2 by 2 reduced echelon matrices R whose entries are all 0’s 
and 1’s? 

(b) What are the eight 1 by 3 matrices containing only 0’s and 1’s? Are all eight of 
them reduced echelon matrices R? 


Explain why A and —A always have the same reduced echelon form R. 
Challenge Problems 


If A is 4 by 4 and invertible, describe all vectors in the nullspace of the 4 by 8 matrix 
B = [4 Al. 


How is the nullspace N (C) related to the spaces N (A) and N(B), if C = | A | ? 


B 
Kirchhoff’s Law says that current in = current out at every node. This network has 
six currents y1,..., y6 (the arrows show the positive direction, each y; could be 


positive or negative). Find the four equations Ay = 0 for Kirchhoff’s Law at the 
four nodes. Find three special solutions in the nullspace of A. 


yı 
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3.3 The Rank and the Row Reduced Form 


The numbers m and n give the size of a matrix—but not necessarily the true size of a linear 
system. An equation like 0 = 0 should not count. If there are two identical rows in A, 
the second one disappears in elimination. Also if row 3 is a combination of rows 1 and 2, 
then row 3 will become all zeros in the triangular U and the reduced echelon form R. 
We don’t want to count rows of zeros. The true size of A is given by its rank: 


That definition is computational, and I would like to say more about the rank r. 
The matrix will eventually be reduced to r nonzero rows. Start with a 3 by 4 example. 


Four columns 12i 
s A=|122 5]. (1) 
How many pivots? 132 6 


The first two columns are (1, 1, 1) and (1,2,3), going in different directions. Those will 
be pivot columns. The third column (2, 2, 2) is a multiple of the first. We won’t see a pivot 
in that third column. The fourth column (4, 5, 6) is a combination of the first three (their 
sum). That column will also be without a pivot. 

The fourth column is actually a combination 3(1, 1,1) + (1,2,3) of the two pivot 
columns. Every “free column” is a combination of earlier pivot columns. It is the 
special solutions s that tell us those combinations of pivot columns: 


Column 3 = 2 (column 1) sı = (-2,0,1,0) As; =0 
Column 4 = 3 (column 1) + 1 (column 2) S2 = (-3,-1,0,1) As2=0 
With nice numbers we can see the right combinations. The systematic way to find s is by 


elimination! This will change the columns but it won’t change the combinations, because 
Ax = Ois equivalent to Ux = 0 and also Rx = 0. I will go from A to U and then to R: 


1 12 4 1 12 4 112 4 
1224656;])7;0 1031s 53/0 10i1+';/=7U 
1 3 2 6 02 0 2 00 0 0 


U already shows the two pivots in the pivot columns. The rank of A (and U) is 2. 
Continuing to R we see the combinations of pivot columns that produce the free columns: 


1124 Subtract 1 0 2 3 
U=|0 1 O 1 — R=| 0 íi 0 1 (2) 
0000 row | — row 2 0000 


Clearly the (3, 1,0) column equals 3 (column 1) + column 2. Moving all columns to the 
“left side” will reverse signs to —3 and —1, which go in the special solution s: 


-—3 (column 1) — (column 2) + (column 4) = 0 * ` s = (3, —1,0, 1). 
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Rank One 


Matrices of rank one have only one pivot. When elimination produces zero in the first 
column, it produces zero in all the columns. Every row is a multiple of the pivot row. At 
the same time, every column is a multiple of the pivot column! 


1 3 10 1 3 10 
Rank one matrix A=]|2 6 20 — R=|0 0 0 
3 9 30 0 0 0 


The column space of a rank one matrix is “one-dimensional”. Here all columns are on the 
line through u = (1,2,3). The columns of A are u and 3u and 10u. Put those numbers 
into the row v? =[ 1 3 10 ] and you have the special rank one form A = uv"; 


1 3 10 1 |f1 3 10] 
A = column times row = uv! 2 6 20 |=| 2 _ (3) 
3 9 30 3 


With rank one, the solutions to Ax = 0 are easy to understand. That equation u(v'x) = 0 
leads us to v'x = 0. All vectors x in the nullspace must be orthogonal to v in the 
row space. This is the geometry: row space = line, nullspace = perpendicular plane. 
Now describe the special solutions with numbers: 


Pivot row [1 3 10] —3 —10 
Pivot variable x, s, =] 1 S2 = 
Free variables xz and x3 0 1 


The nullspace contains all combinations of sı and s2. This produces the plane x + 3y + 
10z = 0, perpendicular to the row (1,3, 10). Nullspace (plane) perpendicular to row 
space (line). 


Example 1 When all rows are multiples of one pivot row, the rank is r = 1: 


Our second definition of rank will be at a higher level. It deals with entire rows and 
entire columns—vectors and not just numbers. The matrices A and U and R have r inde- 
pendent rows (the pivot rows). They also have r independent columns (the pivot columns). 
Section 3.5 says what it means for rows or columns to be independent. 

A third definition of rank, at the top level of linear algebra, will deal with spaces of 
vectors. The rank r is the “dimension” of the column space. It is also the dimension of 
the row space. The great thing is that r also reveals the dimension of the nullspace. 
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The Pivot Columns 


The pivot columns of R have 1’s in the pivots and 0’s everywhere else. The r pivot columns 
taken together contain an r by r identity matrix J. It sits above m — r rows of zeros. The 
numbers of the pivot columns are in the list pivcol. 

The pivot columns of A are probably not obvious from A itself. But their column 
numbers are given by the same list pivcol. The r columns of A that eventually have pivots 
(in U and R) are the pivot columns of A. This example has pivcol = (1,3): 


Pivot 13 02-1 t3 0 2 -1 
Columns A=|0 0 1 4 —3 | yields R=|:0: 0 L 4 3 
13 1 6-4 0 0 0 0 0 


The column spaces of A and R are different! All columns of this R end with zeros. 
Elimination subtracts rows 1 and 2 of A from row 3, to produce that zero row in R: 


1 0 0 1 0 0 
ae E=| 0 1 0 and E-'=|0 1 0 
7 -1 —1 1 1 1 1l 


The r pivot columns of A are also the first r columns of E~!. The r by r identity matrix 
inside R just picks out the first r columns of E~! as columns of A = ET! R. 

One more fact about pivot columns. Their definition has been purely computational, 
based on R. Here is a direct mathematical description of the pivot columns of A: 


A pivot column of R (with 1 in the pivot row) cannot be a combination of earlier 
columns (with 0’s in that row). The same column of A can’t be a combination of earlier 
columns, because Ax = 0 exacily when Rx = 0. 

Now we look at the special solution x from each free column. 


The Special Solutions 


Each special solution to Ax = 0 and Rx = 0 has one free variable equal to 1. The other 
free variables in x are all zero. The solutions come directly from the echelon form R: 


X1 
Free columns 13 0 2 -1 X2 0 
Free variables Rx=10 0 1 4 -3 x3 |= |0 
in boldface 0000 0 x4 0 
X5 


Set the first free variable to x2 = 1 with x4 = x5 = 0. The equations give the pivot 
variables x; = —3 and x3 = 0. The special solution is sı = (—3, 1,0, 0, 0). 
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The next special solution has x4 = 1. The other free variables are x2 = x5 = 0. The 
solution is s2 = (—2,0, —4, 1,0). Notice —2 and —4 in R, with plus signs. 

The third special solution has xs = 1. With x2 = 0 and x4 = 0 we find s3 = 
(1,0, 3,0, 1). The numbers x; = 1 and x3 = 3 are in column 5 of R, again with opposite 
signs. This is a general rule as we soon verify. The nullspace matrix N contains the three 
special solutions in its columns, so AN = zero matrix: 


Nullspace matrix 
n—-r=5-2 
3 special solutions 


The linear combinations of these three columns give all vectors in the nullspace. This is 
the complete solution to Ax = 0 (and Rx = 0). Where R had the identity matrix (2 by 2) 
in its pivot columns, N has the identity matrix (3 by 3) in its free rows. 

There is a special solution for every free variable. Since r columns have pivots, that 
leaves n — r free variables. This is the key to Ax = 0 and the nullspace: 


When we introduce the idea of “independent” vectors, we will show that the special 
solutions are independent. You can see in N that no column is a combination of the other 
columns. The beautiful thing is that the count is exactly right: 


Ax = Q has r independent equations so it has n — r independent solutions. 


The special solutions are easy for Rx = 0. Suppose that the first r columns are the 
pivot columns. Then the reduced row echelon form looks like 


oO T CF |r pivotrows 


© | m-—r zero rows 
r pivot columns n —r free columns 


(4) 


r pivot variables 


n —r free variables 


Check RN = 0. The first block row of RN is (I times —F) + (F times J) = zero. 
The columns of N solve Rx = 0. When the free part of Rx = 0 moves to the right side, 
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the left side just holds the identity matrix: 


(6) 


Rx =0 means i o pivot. } =>} 


In each special solution, the free variables are a column of J. Then the pivot variables are 
a column of — F. Those special solutions give the nullspace matrix N. 

The idea is still true if the pivot columns are mixed in with the free columns. Then Z 
and F are mixed together. You can still see — F in the solutions. Here is an example where 
I = [1] comes first and F = [2 3] comes last. 


Example 2 The special solutions of Rx = x; + 2x2 + 3x3 = 0 are the columns of N: 
F —2 -3 
R=[1 2 3] v =| |= 1 0 

0 


The rank is one. There are n — r = 3 — 1 special solutions (—2, 1,0) and (—3, 0, 1). 


Final Note How can I write confidently about R not knowing which steps MATLAB will 
take? A could be reduced to R in different ways. Very likely you and Mathematica and 
Maple would do the elimination differently. The key is that the final R is always the same. 
The original A completely determines the I and F and zero rows in R. 

For proof I will determine the pivot columns (which locate J) and free columns (which 
contain F) in an “algebra way”—two rules that have nothing to do with any particular 
elimination steps. Here are those rules: 


1. The pivot columns are not combinations of earlier columns of A. 


2. The free columns are combinations of earlier columns (F tells the combinations). 


A small example with rank one will show two E’s that produce the correct EA = R: 


A= f i] ‘reduces to R= l k o| = rref(A) and no other R. 


You could multiply row 1 of A by 4, and subtract row 1 from row 2: 


, 1 0||1/⁄2 OF | 1/2 OF _ 
Two steps give £ E Ws = l [=z 


Or you could exchange rows in A, and then subtract 2 times row 1 from row 2: 


. : 1 O}70 17 [0 1] _ 
Two different steps give Enew È | f | = f 4 = Enew.- 


Multiplication gives EA = R and also Ene, A = R. Different E’s but the same R. 
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Codes for Row Reduction 


There is no way that rref will ever come close in importance to lu. The Teaching Code elim 
for this book uses rref. Of course rref(R) would give R again! 


MATLAB: [R, pivcol] = rref(A) Teaching Code: [£,R] = elim(A) 


The extra output pivcol gives the numbers of the pivot columns. They are the same in A 
and R. The extra output E in the Teaching Code is an m by m elimination matrix that 
puts the original A (whatever it was) into its row reduced form R: 


EA=R. 


The square matrix E is the product of elementary matrices E;; and also Pj; and D7! 
Pi; exchanges rows. The diagonal D~! divides rows by their pivots to produce 1’s. 

If we want E, we can apply row reduction to the matrix [ A 7 | with n + m columns. 
All the elementary matrices that multiply A (to produce R) will also multiply J (to produce 
E). The whole augmented matrix is being multiplied by Æ: 


E[A 1I] = [R E] (7) 
This is exactly what “Gauss-Jordan” did in Chapter 2 to compute A~!. When A is 
square and invertible, its reduced row echelon form is I. Then EA = R becomes 


EA = I. In this invertible case, E is A`}. This chapter is going further, to every A. 


m REVIEW OF THE KEY IDEAS =" 


. The rank r of A is the number of pivots (which are 1’s in R = rref(A)). 
. Ther pivot columns of A and R are in the same list pivcol. 
. Those r pivot columns are not combinations of earlier columns. 


. The n —r free columns are combinations of earlier columns (pivot columns). 


n e WY N = 


. Those combinations (using —F taken from R) give the n — r special solutions to 
Ax = Qand Rx = 0. They are the n — r columns of the nullspace matrix N. 


m WORKED EXAMPLES = 


3.3A Find the reduced echelon form of A. What is the rank? What is the special solution 
to Ax = 0? 


1 -l 0 O 
Second differences —1, 2, —1 A= —1 2 —Il 0 
Notice A11 = A44 = 1 T 0 -l 2 -l1 


0 0 -iI 1 
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Solution Add row | to row 2. Then add row 2 to row 3. Then add row 3 to row 4: 


1 -l 0 0 
: . 0 1 -l 0 
First differences 1, —1 U = 0 0 I 
0 0 0 0 
Now add row 3 to row 2. Then add row 2 to row 1 
100 —i 
0 1 0 —i I F 
Reduced form R= 001-117 l 0 0 | 
0 0 0 0 
The rank is r = 3. There is one free variable (n — r = 1). The special solution is 


s = (1,1,1,1). Every row adds to 0. Notice —F = (1, 1, 1) in the pivot variables of s. 


3.3B Factor these rank one matrices into A = uv = column times row: 
1 2 3 a b 
A=|2 4 6 a=] | (find d from a,b,c ifa Æ 0) 
3 6 9 cd 


Split this rank two matrix into ujv! + uzv! = (3 by 2) times (2 by 4) using R: 
1102 1 1 0 100 1 
A=/}1203{/=]12 0 0 1 0 1 |=E7!R. 
23 0 5 2 3 1 00 0 0 


Solution For the 3 by 3 matrix A, all rows are multiples of vT = [1 2 3]. All columns 
are multiples of the column u = (1, 2,3). This symmetric matrix has u = v and A is uu”. 
Every rank one symmetric matrix will have this form or else —uu"™. 

If the 2 by 2 matrix [2 © ] has rank one, it must be singular. In Chapter 5, its determinant 
is ad — bc = 0. In this chapter, row 2 is c/a times row 1. 


[eiJe] E A [E shy] se eats 


The 3 by 4 matrix of rank two is a sum of two matrices of rank one. All columns of A 
are combinations of the pivot columns 1 and 2. All rows are combinations of the nonzero 
rows of R. The pivot columns are u; and uz and those rows are vT and v3. Then A is 
u,v} + U2v3, multiplying r columns of E—! times r rows of R: 


Columns 1 1 0 2 1 fl 0 0 1] 1 fo 1 0 H 
times 1203 ]= 1 +] 2 
rows 2 3 0 5 3 
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3.3 C Find the row reduced form R and the rank r of A and B (those depend on c). 
Which are the pivot columns of A? What are the special solutions and the matrix N? 


1 2 | 
Find special solutions A=| 3 6 3 and B= | ce | 
4 8 c ce 


Solution The matrix A has rank r = 2 except if c = 4. The pivots are in columns 1 
and 3. The second variable xz is free. Notice the form of R: 


12 0 121 
c#4 R=| 001 c=4 R=|000 
00 0 00 0 


Two pivots leave one free variable x2. But when c = 4, the only pivot is in column 1 
(rank one). The second and third variables are free, producing two special solutions: 


—2 

c #4 Special solution with x2 = 1 goes into N = 1 
0 

—2 -I 
c =4 Another special solution goes into N = 1 0 
0 1 


The 2 by 2 matrix [E c] has rank r = | except if c = 0, when the rank is zero! 


c #0 R=| 4 0 | and v=[ =] Nullspace = line 


The matrix has no pivot columns if c = 0. Then both variables are free: 


_ _| 0, 0 _| 1 0 _ p2 
c=0 R=| oo] and v=| 4 | Nullspace = R^. 


Problem Set 3.3 


1 Which of these rules gives a correct definition of the rank of A? 


(a) The number of nonzero rows in R, 
(b) The number of columns minus the total number of rows. 
(c) The number of columns minus the number of free columns. 


(d) The number of 1’s in the matrix R. 
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2 Find the reduced row echelon forms R and the rank of these matrices: 
(a) The 3 by 4 matrix with all entries equal to 4. 
(b) The 3 by 4 matrix with aj; =i + j —1. 
(c) The 3 by 4 matrix with aj; = (—1)/. 


3 Find the reduced R for each of these (block) matrices: 


2 4 6 


4 Suppose ail the pivot variables come Fast instead of first. Describe all four blocks in 
the reduced echelon form (the block B should be r by r): 


A B 
e=[4 2]. 
What is the nullspace matrix N containing the special solutions? 


5 (Silly problem) Describe all 2 by 3 matrices A; and A2, with row echelon forms 
R, and R3, such that R + Rə- is the row echelon form of A; + A2. Is is true that 
R, = A, and R2 = A2 in this case? Does Ry — R2 equal rref(A; — 42)? 


6 If A has r pivot columns, how do you know that AT has r pivot columns? Give a 3 
by 3 example with different column numbers in pivcol for A and AT. 


7 What are the special solutions to Rx = 0 and yTR = 0 for these R? 


102 3 0 1 2 
R=/0 1 4 5 R=|0 0 0 
0 0 0 0 0 0 0 


Problems 8-11 are about matrices of rank r = 1. 


8 Fill out these matrices so that they have rank 1: 


12 4 9 a b 
A=|2 and B=] 1 and “=|? |. 
4 ` 2 6 -3 


9 If A is an m by n matrix with r = 1, its columns are multiples of one column and its 
rows are multiples of one row. The column space is a in R”, The nullspace 


is a in R”. The nullspace matrix N has shape 
10 Choose vectors u and v so that A = wv! = column times row: 
3 6 6 
A=/1 2 2 and a= = 3 sh 
4 8 8 


A = uv' is the natural form for every matrix that has rank r = 1. 


11 Tf A is a rank one matrix, the second row of U is . Do an example. 


3.3. The Rank and the Row Reduced Form 153 


Problems 12-14 are about r by r invertible matrices inside A. 


12 If A has rank r, then it has an r by r submatrix S that is invertible. Remove 
m —r rows and n —r columns to find an invertible submatrix S inside A, B, and C. 
You could keep the pivot rows and pivot columns: 


0 1 0 
1 2 3 1 2 3 
a=| | B=] | C=};0 0 0 

1 2 4 2 4 6 001 


13 Suppose P contains only the r pivot columns of an m by n matrix. Explain why this 
m by r submatrix P has rank r. 


14 Transpose P in problem 13. Then find the r pivot columns of PT. Transposing back, 
this produces an r by r invertible submatrix S inside P and A: 


12 3 
ForA=|2 4 6) find P G by 2) and then the invertible S (2 by 2). 
2 4 7 


Problems 15-20 show that rank(A B) is not greater than rank(A) or rank(B). 


15 ‘Find the ranks of AB and AC (rank one matrix times rank one matrix): 


1 2 2 1 4 1 b 
a=|) ‘| and a=; 1.5 | and c=? be 


16 The rank one matrix uv times the rank one matrix wz! is uz! times the number 
. This product uv" wz! also has rank one unless = 0. 
17 (a) Suppose column j of B is a combination of previous columns of B. Show that 


column j of AB is the same combination of previous columns of AB. Then 
AB cannot have new pivot columns, so rank(A B) < rank(B). 


(b) Find A; and A2 so that rank(A; B) = 1 and rank(A2B) = 0 for B = [: 1 , 


18 Problem 17 proved that rank(AB) < rank(B). Then the same reasoning gives 
rank(B! A‘) < rank(A'). How do you deduce that rank(A B} < rank A? 


19 (important) Suppose A and B are n by n matrices, and AB = I. Prove from 
rank(AB) < rank(A) that the rank of A is n. So A is invertible and B must be its 
two-sided inverse (Section 2.5). Therefore BA = I (which is not so obvious!). 


20 = If Ais 2 by 3 and B is 3 by 2 and AB = J, show from its rank that BA 4 J. Give an 
example of A and B with AB = I. Form < n, a right inverse is not a left inverse. 


21 Suppose A and B have the same reduced row echelon form R. 


(a) Show that A and B have the same nullspace and the same row space. 
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(b) We know E14 = Rand £28 = R. So A equals an matrix times B. 
Express A and then B as the sum of two rank one matrices: 
1 1 0 
rank=2 A=|1 1 4 B=|> s| 
1 1 8 
Answer the same questions as in Worked Example 3.3 C for 
1 1 2 2 
A=|2 2 4 4] ad =("5° >|: 
I c 2 2 


What is the nullspace matrix N (containing the special solutions) for A, B, C? 


I I 


A=[J I] and a=(4 0 


| and C=[J I I}. 


Neat fact Every m by n matrix of rank r reduces to (m by r) times by n): 


(pivot columns: of A first r FOWS-0 of R)= = = (COL) (l OW): a 


Write the 3 3 by 4 matrix A in equation (1) at the start of this section as s the product of 
the 3 by 2 matrix from the pivot columns and the 2 by 4 matrix from R. 


Challenge Problems 


Suppose A is an m by n matrix of rank r. Its reduced echelon form is R. Describe 
exactly the matrix Z (its shape and all its entries) that comes from transposing the 
reduced row echelon form of R' (prime means transpose): 


R= rref(A) and Z = (rref(R’))’. 


Suppose R is m by n of rank r, with pivot columns first: 


I F 
R= l Ir | 
(a) What are the shapes of those four blocks? 
(b) Find a right-inverse B with RB = I ifr =m 
(c) Find a left-inverse C with CR = I ifr =n. 
(d) What is the reduced row echelon form of RT (with shapes)? 
(e) What is the reduced row echelon form of RTR (with shapes)? 


Prove that RTR has the same nullspace as R. Later we show that ATA always has 
the same nullspace as A (a valuable fact). 


Suppose you allow elementary column operations on A as well as elementary row 
operations (which get to R). What is the “row-and-column reduced form” for an m 
by n matrix of rank r? 
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3.4 The Complete Solution to Ax = b 


The last sections totally solved Ax = 0. Elimination converted the problem to Rx = 0. 
The free variables were given special values (one and zero). Then the pivot variables were 
found by back substitution. We paid no attention to the right side b because it started and 
ended as zero. The solution x was in the nullspace of A. 

Now b is not zero. Row operations on the left side must act also on the right side. 
Ax = b is reduced to a simpler system Rx = d. One way to organize that is to add b as 
an extra column of the matrix. 1 will “augment” A with the right side (b,,b2,b3) = 
(1, 6, 7) and reduce the bigger matrix [ A b |: 


13 0 2|; 1 has the 13 0241 
001 4)/|"?|=|6| augmented |O 0 1 4 6)/=[A b]. 
13 1 6j? 7| matrix 13 167 

4 


The augmented matrix is just [ A b |. When we apply the usual elimination steps to A, 
we also apply them to b. That keeps all the equations correct. 

In this example we subtract row 1 from row 3 and then subtract row 2 from row 3. This 
produces a complete row of zeros in R, and it changes b to a new right side d = (1, 6,0): 


13 0 27] 1] baste [1302 14) o oo | 
0 0 1 4 x. = | 6 | augmented | 0 0 1 4 6 =f R a]: 
o 0 0 of] ol marx (0 0000{ 


That very last zero is crucial. The third equation has become 0 = 0 and the equations can 
be solved. In the origina] matrix A, the first row plus the second row equals the third row. 
If the equations are consistent, this must be true on the right side of the equations also! 
The all-important property on the right side was 1 + 6 = 7. 

Here are the same augmented matrices for a general b = (b1, b2, b3): 


13 0 2°‘ 1302 b 
[A b]=|0 0 1 4 bo] —>]0 01 4 by =[R d] 
1 3 1 6 b3 0 0 0 0 b3—b,—be 


Now we get 0 = 0 in the third equation provided b3 — bı — b2 = 0. This is bı + b2 = b3. 


One Particular Solution 


For an easy solution x , choose the free variables to be xz = x4 = 0. Then the two nonzero 
equations give the two pivot variables x; = 1 and x3 = 6. Our particular solution to 
Ax = b (and also Rx = d)is xp = (1,0,6,0). This particular solution is my favorite: 
free variables = zero, pivot variables from d. The method always works. 
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For a solution to exist, zero rows in R must also be zero in d. Since J is in the pivot 
rows and pivot columns of R, the pivot variables in X particular come from d: 


1 
13 0 2 0 1 Pivot variables 1, 6 
Rx» =| 0014 6 | 6 Free variables 0, 0 
000 0 0 0 


Notice how we choose the free variables (as zero) and solve for the pivot variables. After 
the row reduction to R, those steps are quick. When the free variables are zero, the pivot 
variables for xp are already seen already seen in the right side vector d . 


* particular 


*¥ nullspace 


That particular solution is (1,0,6,0). The two special (nullspace) solutions to 
Rx = 0 come from the two free columns of R, by reversing signs of 3,2, and 4. 
Please notice how I write the complete solution xp + Xn to Ax = b: 


ay D 

2 0 1 0 

X= Xptxn= | 6] tx2] 9 | +4] _y 
0 0 


Question Suppose A is a square invertible matrix, m = n = r. What are xp and xn? 

Answer The particular solution is the one and only solution A~1b. There are no 
special solutions or free variables. R = J has no zero rows. The only vector in the 
nullspace is x, = 0. The complete solution is x = xp + xn = A-1b +0. 

This was the situation in Chapter 2. We didn’t mention the nullspace in that chapter. 
N (A) contained only the zero vector. Reduction goes from [ A 5]|to[J Atb]. The 
original Ax = b is reduced all the way to x = A~'b which is d. This is a special case 
here, but square invertible matrices are the ones we see most often in practice. So they got 
their own chapter at the start of the book. 

For small examples we can reduce [A b] to [R d]. For a large matrix, 
MATLAB does it better. One particular solution (not necessarily ours) is A\b from back- 
slash. Here is an example with full column rank. Both columns have pivots. 


Example 1 Find the condition on (b1, b2, b3) for Ax = b to be solvable, if 


1 1 bi 
A= 1 2| and b= | bz 
—2 —3 b3 


This condition puts b in the column space of A. Find the complete x = xp + Xn. 
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Solution Use the augmented matrix, with its extra column b. Subtract row 1 of [A b | 
from row 2, and add 2 times row 1 to row 3 to reach [ Ra |: 


1 1b i 1 b 1 0 2b; —b2 
1 2 ba —> |0 1 b2 — by — |0 1 bz — bı 
—2 —3 b, 0 —1 b34+2b, 0 0 b3+bi+b 


The last equation is 0 = 0 provided b3 + bı + b2 = 0. This is the condition to put b 
in the column space; then Ax = b will be solvable. The rows of A add to the zero row. 
So for consistency (these are equations!) the entries of b must also add to zero. 

This example has no free variables since n —r = 2—2. Therefore no special solutions. 
The nullspace solution is x, = 0. The particular solution to Ax = b and Rx = d is at the 
top of the augmented column d: 


Only solution X=XptXn = oe | + [ol . 


If b3 + bı + be is not zero, there is no solution to Ax = b (xp doesn’t exist). 

This example is typical of an extremely important case: A has full column rank. 
Every column has a pivot. The rank is r = n. The matrix is tall and thin (m > n). 
Row reduction puts { at the top, when A is reduced to R with rank z: 


(1) 


Full column rank R= I | _ | n byn identity matrix 
0 m —n rows of zeros 


There are no free columns or free variables. The nullspace matrix is empty! 
We will collect together the different ways of recognizing this type of matrix. 


In the essential language of the next section, this A has independent columns. 
Ax = 0 only happens when x = 0. In Chapter 4 we will add one more fact to the list: 
The square matrix A’ A is invertible when the rank is n. 

In this case the nullspace of A (and R) has shrunk to the zero vector. The solution to 
Ax = b is unique (if it exists). There will be m — n (here 3 — 2) zero rows in R. So there 
are m — n conditions in order to have 0 = 0 in those rows, and b in the column space. 
With full column rank, Ax = b has one solution or no solution (m > n is overdetermined). 
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The Complete Solution 


The other extreme case is full row rank. Now Ax = b has one or infinitely many solutions. 
In this case A must be short and wide (m < n). A matrix has full row rank if r = m 
(“independent rows”). Every row has a pivot, and here is an example. 


Example 2 There are n = 3 unknowns but only m = 2 equations: 


x + y +z 


= 3 
Full row rank x + 2y - z = 4 


(rank r = m = 2) 


These are two planes in xyz space. The planes are not parallel so they intersect in a line. 
This line of solutions is exactly what elimination will find. The particular solution will be 
one point on the line. Adding the nullspace vectors x, will move us along the line. Then 
X = Xp + Xn gives the whole line of solutions. 

We find xp and x, by elimination on [A b ]. Subtract row 1 from row 2 and then 
subtract row 2 from row 1: 


11 1 3 11 13 10 32 
Lat 11> fo 1 —2 lehona i]=08 a] 


The particular solution has free variable x3 = 0. The special solution has x3 = 1: 


X particular Comes directly from d on the right side: xp = (2, 1,0) 
X special Comes from the third column (free column) of R: s = (—-3,2, 1) 


It is wise to check that xp and s satisfy the original equations Ax, = b and As = 0: 


2+1 
2+2 


3 —3+2+1 = 0 
4 —3+4-1 0 


The nullspace solution x, is any multiple of s. It moves along the line of solutions, starting 
at X particular- Please notice again how to write the answer: 


1]+x3] 2].% 


This line is drawn in Figure 3.3. Any point on the line could have been chosen as the 
particular solution; we chose the point with x3 = 0. 

The particular solution is not multiplied by an arbitrary constant! The special solution 
is, and you understand why. 

Now we summarize this short wide case of full row rank. If m < n the equation 
Ax = b is underdetermined (many solutions). 


3.4. The Complete Solution to Ax = b 159 


Line of solutions to Ax = b 


Line of solutions to Ax = 0 


Figure 3.3: Complete solution = one particular solution + ali nullspace solutions. 


In this case with m pivots, the rows are “linearly independent”. So the columns of AT 
are linearly independent. We are more than ready for the definition of linear independence, 
as soon as we summarize the four possibilities—which depend on the rank. Notice how r, 
m, n are the critical numbers. 


The four possibilities for linear equations depend on the rankr: 


r=m and r=n Square and invertible Ax =b has 1 solution 
r=m and r<n Short and wide Ax =b has œ solutions 
r<m and r=n Tall and thin Ax =b has 0Oor l solution 
r<m and r<n_ <Not full rank Ax =b has 0 or oo solutions 


The reduced R wili fall in the same category as the matrix A. In case the pivot columns 
happen to come first, we can display these four possibilities for R. For Rx = d (and the 
original Ax = b) to be solvable, d must end in m — r zeros. 


Four types R=[T] [I F] H i 0 
Their ranks r=m=n rem<n ren<m -r<myr<n 


Cases 1 and 2 have full row rank r = m. Cases 1 and 3 have full column rank r = n. 
Case 4 is the most general in theory and it is the least common in practice. 


Note My classes used to stop at U before reaching R. Instead of reading the complete 
solution directly from Rx = d, we found it by back substitution from Ux = c. This 
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reduction to U and back substitution for x is slightly faster. Now we prefer the complete 
reduction: a single “1” in each pivot column. Everything is so clear in R (and the computer 
should do the hard work anyway) that we reduce all the way. 


anna kW ND m 


= REVIEW OF THE KEY IDEAS = 


. The rank r is the number of pivots. The matrix R has m — r zero rows. 


Ax = b is solvable if and only if the last m — r equations reduce to 0 = 0. 


. One particular solution x, has all free variables equal to zero. 
. The pivot variables are determined after the free variables are chosen. 
. Full column rank r = n means no free variables: one solution or none. 


. Full row rank r = m means one solution if m = n or infinitely many if m < n. 


= WORKED EXAMPLES m 


3.4 A This question connects elimination (pivot columns and back substitution) to 
column space-nullspace-rank-solvability (the full picture). A has rank 2: 


Aan RAUN m 


xı +2x2 + 3x3 + 5x4 = bı 
Áx =b is 2x1 + 4x2 + 8x3 + 12x4 = bz 
3x1 + 6x2 + 7x3 + 13x4 = b3 


. Reduce [A b]to[U c], so that Ax = b becomes a triangular system Ux = c. 
. Find the condition on b1, b2, b3 for Ax = b to have a solution. 

. Describe the column space of A. Which plane in R? ? 

. Describe the nullspace of A. Which special solutions in R* ? 

. Find a particular solution to Ax = (0, 6, —6) and then the complete solution. 

. Reduce[U c]to[R d ]: Special solutions from R, particular solution from d. 


Solution 


1. 


The multipliers in elimination are 2 and 3 and —1. They take [A Bb]into[U cœ]. 
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2. The last equation shows the solvability condition b3 + b2 — 5b, = 0. Then 0 = 0. 

3. First description: The column space is the plane containing all combinations of the 
pivot columns (1, 2,3) and (3,8,7). The pivots are in columns 1 and 3. Second 
description: The column space contains all vectors with b3 + b2 — 56, = 0. That 
makes Ax = b solvable, so b is in the column space. All columns of A pass this test 
b3 + bz — 5b, = 0. This is the equation for the plane in the first description | 

4. The special solutions have free variables x2 = 1,x4 = 0 and then x2 = 0, x4 = 1: 


—2 —2 

Special solutions to Ax = 0 l 0 
S1 = S2 = 

Back substitution in Ux = 0 0 -1 

0 1 


The nullspace N (A) in R* contains all xa = cS; + €282. 
5. One particular solution xp has free variables = zero. Back substitute in Ux = c: 


—9 

Particular solution to Ax, = b = (0,6, —6) 0 
Xn = 

This vector b satisfies b3 + b2 —5b, = 0 P 3 

0 


The complete solution to Ax = (0, 6, —6) is x = xp + all xp. 
6. In the reduced form R, the third column changes from (3,2,0) in U to (0,1,0). 
The right side c = (0, 6,0) becomes d = (—9, 3,0) showing —9 and 3 in xp: 


12350 1202 -9 
[U c]=| 0022 6 |—[R d]=| 0011 3 
00000 0000 0 


3.4B Ifyou have this information about the solutions to Ax = b for a specific b, what 
does that tell you about the shape of A (and A itself)? And possibly about b. 


. There is exactly one solution. 

. All solutions to Ax = b have the form x = [? | 
. There are no solutions. l 

. All solutions to Ax = b have the form x = | 1| 


. There are infinitely many solutions. 


OA Bb Wh m 


Solution Incase 1, with exactly one solution, A must have full column rank r = n. 
The nullspace of A contains only the zero vector. Necessarily m > n. 

In case 2, A must have n = 2 columns (and m is arbitrary). With [+] in the nullspace 
of A, column 2 is the negative of column 1. Also A # 0: the rank is 1. With x = [3] asa 
solution, b = 2(column 1) + (column 2). My choice for x, would be (1, 0). 

In case 3 we only know that b is not in the column space of A. The rank of A must be 
less than mm. I guess we know b 0, otherwise x = 0 would be a solution. 
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In case 4, A must have n = 3 columns. With (1,0, 1) in the nullspace of A, column 3 
is the negative of column 1. Column 2 must not be a multiple of column 1, or the nullspace 
would contain another special solution. So the rank of A is 3 — 1 = 2. Necessarily A has 
m > 2 rows. The right side b is column 1 + column 2. 

In case 5 with infinitely many solutions, the nullspace must contain nonzero vectors. 
The rank r must be less than n (not full column rank), and b must be in the column space 
of A. We don’t know if every b is in the column space, so we don’t know if r = m. 


3.4C Find the complete solution x = x p + x” by forward elimination on [A 5]: 


1210 4 

2448 2 j=| 2 

48 6 8 3 10 
X4 


Find numbers y1, y2, y3 so that yı (row 1) + y2 (row 2) + y3 (row 3) = zero row. Check 
that b = (4,2,10) satisfies the condition y1bı + y2b2 + y3b3 = 0. Why is this the 
condition for the equations to be solvable and 6 to be in the column space? 


Solution Forward elimination on [4 b] produces a zero row in [U c]. The third equa- 
tion becomes 0 = 0 and the equations are consistent (and solvable): 


12 10 4 1210 4 1210 4 
2448 2 |—]| 00 2 8 -6 | —]| 0 02 8 -6 
4 8 6 8 10 00 2 8 -6 0000 9 


Columns 1 and 3 contain pivots. The variables x2 and x4 are free. If we set those to zero 
we can solve (back substitution) for the particular solution x p = (7,0,—3,0). We see 7 
and —3 again if elimination continues all the way to [R d]: 


1210 4 1210 4 1 2 0 -4 7 
002 8 -6 | —] 0 01 4 -3 |}-—+|0 0 1 4 -3 
0000 0 0000 0 000 0 90 


For the nullspace part xy, with b = 0, set the free variables x2, x4 to 1, 0 and also 0, 1: 
Special solutions sı = (—2,1,0,0) and s = (4,0, -—4,1) 


Then the complete solution to Ax = b (and Rx = d) is Xcomplete = X p + C151 + C252. 
The rows of A produced the zero row from 2(row 1) + (row 2) — (row 3) = (0,0, 0, 0). 
Thus y = (2, 1,—1). The same combination for b = (4, 2, 10) gives 2(4)+(2)—(10) = 0. 
If a combination of the rows (on the left side) gives the zero row, then the same combi- 
nation must give zero on the right side. Of course! Otherwise no solution. 

Later we will say this again in different words: If every column of A is perpendicular 
to y = (2, 1,—1), then any combination b of those columns must also be perpendicular to 
y. Otherwise b is not in the column space and Ax = b is not solvable. 

And again: If y is in the nullspace of AT then y must be perpendicular to every b in 
the column space of A. Just looking ahead... 
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Problem Set 3.4 


1 (Recommended) Execute the six steps of Worked Example 3.4 A to describe the 
column space and nullspace of A and the complete solution to Ax = b: 


24 6 4 bı 4 
A=|2 576 b=| b | =| 3 
23 5 2 bs 5 


2 Carry out the same six steps for this matrix A with rank one. You will find two 
conditions on b1, b2, b3 for Ax = b to be solvable. Together these two conditions 


put b into the space (two planes give a line): 
1 2 1 3 by 10 
A=] 3 [2 1 3] 6 3 9 b=] b | =| 30 
2 4 2 6 b3 20 


Questions 3-15 are about the solution of Ax = b. Follow the steps in the text to xp 
and x,,. Use the augmented matrix with last column b. 


3 Write the complete solution as x, plus any multiple of s in the nullspace: 


x+3y+3z= 1 
2x +6y+9z=5 
=x —3y+3z=5. 


4 Find the complete solution (also called the general solution) to 
13 1 2/7 i 
26 4 8||?/=]3 
002 4 i 1l 


5 Under what condition ón b1, b2, b3 is this system solvable? Include b as a fourth 
column in elimination. Find all solutions when that condition holds: 


x+2y-—2z = b; 
2x + 5y — 4z = bz 
4x + 9y — 8z = b3. 


6 What conditions on b1, b2, b3, b4 make each system solvable? Find x in that case: 


1 2 bı 1 2 3 bi 
2 4{[x, bo 2 4 6|{7 by 
2 5 |- b3 2 5 7||%]|7 |b 
3 9 4 3 9 12|L% ba 
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15 
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Show by elimination that (b1, b2, b3) is in the column space if b3 — 2b2 + 4b, = 0. 


13 1 
A=|3 8 2 
2 4 0 


What combination of the rows of A gives the zero row? 


Which vectors (b1, b2, b3) are in the column space of A? Which combinations of the 
rows of A give zero? 


121 111 
(a) A=]2 6 3 (bs) A=]1 2 4 
02 5 24 8 


(a) The Worked Example 3.4 A reached [U c]from[A b]. Put the multipliers 
into L and verify that LU equals A and Le equals b. 

(b) Combine the pivot columns of A with the numbers —9 and 3 in the particular 
solution xp. What is that linear combination and why? 


Construct a 2 by 3 system Ax = b with particular solution xp = (2,4,0) and 
homogeneous solution x, = any multiple of (1, 1, 1). 


Why can’t a 1 by-3 system have xp = (2,4, 0) and x, = any multiple of (1, 1, 1)? 


(a) If Ax = b has two solutions xı and x23, find two solutions to Ax = 0. 
(b) Then find another solution to Ax = 0 and another solution to Ax = b. 


Explain why these are all false: 


(a) The complete solution is any linear combination of xp and xy. 
(b) A system Ax = b has at most one particular solution. 


(c) The solution x, with all free variables zero is the shortest solution (minimum 
length ||x jl). Find a 2 by 2 counterexample. 


(d) If A is invertible there is no solution x» in the nullspace. 


Suppose column 5 of U has no pivot. Then x5 is a variable. The zero vector 
(is) (is not) the only solution to Ax = 0. If Ax = b has a solution, then it has 
solutions. 


Suppose row 3 of U has no pivot. Then that row is . The equation Ux = ¢ 
is only solvable provided . The equation Ax = b (is) (is not) (might not be) 
solvable. 


Questions 16-20 are about matrices of “full rank” r = m orr =n, 


16 


The largest possible rank of a 3 by 5 matrix is . Then there is a pivot in every 
of U and R. The solution to Ax = b (always exists) (is unique). The column 
space of A is . An example is A = 
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17 


18 


19 


20 


21 


22 


23 


24 


25 


The largest possible rank of a 6 by 4 matrix is . Then there is a pivot in 
every of U and R. The solution to Ax = b (always exists) (is unique). The 
nullspace of A is . An example is A = 


Find by elimination the rank of A and also the rank of A’: 


1 4 0 1 0 1 
A=] 2 ll 5 and A=J|1 1 2 (rank depends on q). 
-1 2 10 114 


Find the rank of A and also of ATA and also of AA’: 


2 0 
a=[j k | and A=|1 1 
1 2 


Reduce A to its echelon form U. Then find a triangular L so that A = LU. 


1 0 1 0 
a= 52 4 and A=|/|2 2 0 3 
065 4 


Find the complete solution in the form x p + Xn to these full rank systems: 


x+ty+z=4 


= 4 b 
@) x+y+z O yarn, 


If Ax = b has infinitely many solutions, why is it impossible for Ax = B (new 
right side) to have only one solution? Could Ax = B have no solution? 


Choose the number q so that (if possible) the ranks are (a) 1, (b) 2, (c) 3: 


6 4 2 
A-|-3 -2 -1| and B=? ; ‘|. 
9 é 4 q 2q 


Give examples of matrices A for which the number of solutions to Ax = b is 


(a) O or 1, depending on b 
(b) co, regardless of b 

(c) 0 or 00, depending on b 
(d) 1, regardless of b. 


Write down all known relations between r and m and n if Ax = b has 


(a) no solution for some b 
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(b) infinitely many solutions for every b 
(c) exactly one solution for some b, no solution for other b 
(d) exactly one solution for every b. 


Questions 26-33 are about Gauss-Jordan elimination (upwards as well as downwards) 
and the reduced echelon matrix R. 


26 Continue elimination from U to R. Divide rows by pivots so the new pivots are all 1. 
Then produce zeros above those pivots to reach R: 


2 4 4 24 4 
U=;0 3 6 and U=/0 3 6 
0 0 0 00 5 


27 Suppose U is square with n pivots (an invertible matrix). Explain why R = I. 


28 Apply Gauss-Jordan elimination to Ux = 0 and Ux = c. Reach Rx = 0 and 
Rx =d: 


[v ol=[o o 4 o] me [v el=fo 9 4 sl 


Solve Rx = 0 to find x, (its free variable is xz = 1). Solve Rx = d to find x, (its 
free variable is x2 = 0). 


29 = Apply Gauss-Jordan elimination to reduce to Rx = 0 and Rx = d: 


3060 3 0 6 9 
U O;J=|0 0 2 0 and U c}={0 0 2 4 
00 0 0 0 0 0 5 


Solve Ux = 0 or Rx = 0 to find x, (free variable = 1). What are the solutions to 
Rx =d? 


30 Reduce to Ux = c (Gaussian elimination) and then Rx = d (Gauss-Jordan): 


‘ 10 2 3]/% 2 
Ax=|1 3 2 0|| |=| 5|=b. 
204 9 3 10 

X4 


Find a particular solution x p and all homogeneous solutions xn. 


31 Find matrices A and B with the given property or explain why you can’t: 
i 


(a) The only solution of Ax = | 2 |isx = | P i 
3 
0 1 
(b) The only solution of Bx = | l | isx =| 2 
3 
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32 


33 


34 


35 


36 


Find the LU factorization of A and the complete solution to Ax = b: 


and b= and then & = 


=ar 


1 
0 
0 
0 


= N = m 
A D U = 
A A We 


The complete solution to Ax = | ; | is x = | 1 | +e] S |. Find A. 


Challenge Problems 


Suppose you know that the 3 by 4 matrix A has the vector s = (2,3, 1,0) as the only 
special solution to Ax = 0. 

(a) What is the rank of A and the complete solution to Ax = 0? 

(b) What is the exact row reduced echelon form R of A? 

(c) How do you know that Ax = b can be solved for all b ? 
Suppose K is the 9 by 9 second difference matrix (2’s on the diagonal, —1’s on 
the diagonal above and also below). Solve the equation Kx = b = (10,...,10). 


If you graph x1, ... , x above the points 1,..., 9 on the x axis, I think the nine points 
fall on a parabola. 


Suppose Ax = b and Cx = b have the same (complete) solutions for every b. 
Is it true that A = C? 
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3.5 Independence, Basis and Dimension 


This important section is about the true size of a subspace. There are n columns in an 
m by n matrix. But the true “dimension” of the column space is not necessarily n. The 
dimension is measured by counting independent columns—and we have to say what that 
means. We will see that the true dimension of the column space is the rank r. 

The idea of independence applies to any vectors v1,...,U, in any vector space. Most 
of this section concentrates on the subspaces that we know and use—especially the col- 
umn space and the nulispace of A. In the last part we also study “vectors” that are not 
column vectors. They can be matrices and functions; they can be linearly independent (or 
dependent). First come the key examples using column vectors. 

The goal is to understand a basis: independent vectors that “span the space”. 


Every vector in the space is a unique combination of the basis vectors. 


We are at the heart of our subject, and we cannot go on without a basis. The four essential 
ideas in this section (with first hints at their meaning) are: 


Linear Independence 


Our first definition of independence is not so conventional, but you are ready for it. 


The columns are independent when the nullspace N (A) contains only the zero vector. 
Let me illustrate linear independence (and dependence) with three vectors in R°: 


1. If three vectors are not in the same plane, they are independent. No combination of 
V1, V2, V3 in Figure 3.4 gives zero except Ovi + 0v2 + 0v3. 


2. If three vectors w1, w2, w3 are in the same plane, they are dependent. 


This idea of independence applies to 7 vectors in 12-dimensional space. If they are the 
columns of A, and independent, the nullspace only contains x = 0. None of the vectors is 
a combination of the other six vectors. 

Now we choose different words to express the same idea. The following definition of 
independence will apply to any sequence of vectors in any vector space. When the vectors 
are the columns of A, the two definitions say exactly the same thing. 
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V1 
In a plane 
Not in 0 
a plane v2 ~N 7 w3 
V3 w: — — w2 


Figure 3.4: Independent vectors v1, v2, v3. Only 0vy + Ov2 + Ov3 gives the vector 0. 
Dependent vectors w1, W2, w3. The combination w; — wz + w3 is (0,0, 0). 


Linear independence 
X1V1 + X202 +++ + XnYn =0 only happens when all x’s are zero 


If a combination gives 0, when the x’s are not all zero, the vectors are dependent. 

Correct language: “The sequence of vectors is linearly independent.” Acceptable 
shortcut: “The vectors are independent.” Unacceptable: “The matrix is independent.” 

A sequence of vectors is either dependent or independent. They can be combined to 
give the zero vector (with nonzero x’s) or they can’t. So the key question is: Which com- 
binations of the vectors give zero? We begin with some small examples in R°: 


(a) The vectors (1, 0) and (0, 1) are independent. 

(b) The vectors (1, 0) and (1, 0.00001) are independent. 

(c) The vectors (1, 1) and (—1, —1) are dependent. 

(d) The vectors (1, 1) and (0, 0) are dependent because of the zero vector. 


(e) In R?, any three vectors (a, b) and (c, d) and (e, f) are dependent. 


Geometrically, (1,1) and (—1, —1) are on a line through the origin. They are dependent. 
To use the definition, find numbers x; and x2 so that x,(1,1) + x2(-1,-1) = (0,9). 
This is the same as solving Ax = 0: 


1 —-l|}|x1}_ |0 _ _ 
I a2 }-[o] for xı = 1 and x2 = 1. 


The columns are dependent exactly when there is a nonzero vector in the nullspace. 
If one of the v’s is the zero vector, independence has no chance. Why not? 
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Three vectors in R? cannot be independent! One way to see this: the matrix A with 
those three columns must have a free variable and then a special solution to Ax = 0. 
Another way: If the first two vectors are independent, some combination will produce the 
third vector. See the second highlight below. 

Now move to three vectors in R°. If one of them is a multiple of another one, these 
vectors are dependent. But the complete test involves all three vectors at once. We put 
them in a matrix and try to solve Ax = 0. 


Example 1 The columns of this A are dependent. Ax = 0 has a nonzero solution: 


1 0 3; /-3 1 0 3 0 
Ax=|]2 1 5 1 is —3/2;/+1]1]/+i/5]= {0 
1 0 3 1 l 0 3 0 


The rank is only r = 2. Independent columns produce full column rank r =n = 3. 
In that matrix the rows are also dependent. Row 1 minus row 3 is the zero row. For a 
square matrix, we will show that dependent columns imply dependent rows. 


Question How to find that solution to Ax = 0? The systematic way is elimination. 


1 0 3 1 0 3 
A=|2 1 5ļ| reducestoR=/]0 1 —1 
10 3 0 0 0 


The solution x = (—3, L, 1) was exactly the special solution. It shows how the free column 
(column 3) is a combination of the pivot columns. That kills independence! 


One case is of special importance because it is clear from the start. Suppose seven 
columns have five components each (m = 5 is less than n = 7). Then the columns must 
be dependent. Any seven vectors from RŽ are dependent. The rank of A cannot be larger 
than 5. There cannot be more than five pivots in five rows. Ax = 0 has at least 7 — 5 = 2 
free variables, so it has nonzero solutions—which means that the columns are dependent. 


This type of matrix has more columns than rows—it is short and wide. The columns are 
certainly dependent if n > m, because Ax = 0 has a nonzero solution. 

The columns might be dependent or might be independent if n < m. Elimination will 
reveal the r pivot columns. Jt is those r pivot columns that are independent. 


Note Another way to describe linear dependence is this: “One vector is a combination 
of the other vectors.” That sounds clear. Why don’t we say this from the start? Our 
definition was longer: “Some combination gives the zero vector, other than the trivial 
combination with every x = 0” We must rule out the easy way to get the zero vector. 
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That trivial combination of zeros gives every author a headache. If one vector is a combi- 
nation of the others, that vector has coefficient x = 1. 

The point is, our definition doesn’t pick out one particular vector as guilty. All columns 
of A are treated the same. We look at Ax = 0, and it has a nonzero solution or it hasn’t. In 
the end that is better than asking if the last column (or the first, or a column in the middle) 
is a combination of the others. 


Vectors that Span a Subspace 


The first subspace in this book was the column space. Starting with columns v1,..., vn, 
the subspace was filled out by including all combinations x, uv, +--+ + Xna. The column 
space consists of all combinations Ax of the columns. We now introduce the single word 
“span” to describe this: The column space is spanned by the columns. 


The columns of a matrix span its column space. They might be dependent. 


Example2 vı = | ol and v2 = H span the full two-dimensional space R?. 


Example 3 vi = pi v2 = [9] v3 = [5 also span the full space R?. 


Example4 w= l and w2 = -l only span a line in R?. So does wy by itself. 
1 -1 


Think of two vectors coming out from (0, 0, 0) in 3-dimensional space. Generally they 
span a plane. Your mind fills in that plane by taking linear combinations. Mathematically 
you know other possibilities: two vectors could span a line, three vectors could span ali of 
R?, or only a plane. It is even possible that three vectors span only a line, or ten vectors 
span only a plane. They are cértainly not independent! 

The columns span the column space. Here is a new subspace—which is spanned by the 
rows. The combinations of the rows produce the “row space”. 


DEFINITION. The j i space of a matrix is the subspace. of R” spanned by the rows. ut E : 


The row v space ofA Ais sC ca". Iti is the column space ofA AT, 


The rows of an m by n matrix have n components. They are vectors in R’—or they 
would be if they were written as column vectors. There is a quick way to fix that: Transpose 
the matrix. Instead of the rows of A, look at the columns of AT. Same numbers, but now 
in the column space C (AT). This row space of A is a subspace of R”. 
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Example 5 Describe the column space and the row space of A. 
1 4 


A=|2 7 and AT =| 
3 5 


1 2 3 


47 5 | Here m = 3 and n =2. 


The column space of A is the plane in R? spanned by the two columns of A. The row space 
of A is spanned by the three rows of A (which are columns of AT). This row space is all 
of R?. Remember: The rows are in R” spanning the row space. The columns are in R” 
spanning the column space. Same numbers, different vectors, different spaces. 


A Basis for a Vector Space 


Two vectors can’t span all of RÌ, even if they are independent. Four vectors can’t be 
independent, even if they span R?. We want enough independent vectors to span the 
space (and not more). A “basis” is just right. 


DEFINITION A basis for a vector space is a sequence of vectors with two properti 


o The basis vectors are linearly independent and they span the space. : i 7 Eo = l 


Uoue 


This combination of properties is fundamental to linear algebra. Every vector v in the space 
is a combination of the basis vectors, because they span the space. More than that, the com- 
bination that produces v is unique, because the basis vectors v1, ..., Un are independent: 


There is one and only one way to write v as a combination of the basis vectors. 


Reason: Suppose v = a101 +-::+@y,v,, and also v = bivi +-'--+5,Un. By subtraction 
(a, —b1)v1 +++++ (an — bn) vn is the zero vector. From the independence of the v’s, each 
a; — bj = 0. Hence a; =b;, and there are not two ways to produce v. 


Example 6 The columns of J = | k i] produce the “standard basis” for R?. 


The basis vectors i = | o| and j= H are independent. They span R?. 


Everybody thinks of this basis first. The vector i goes across and j goes straight up. The 
columns of the 3 by 3 identity matrix are the standard basis 7, j , k. The columns of the n 
by n identity matrix give the “standard basis” for R”. 

Now we find many other bases (infinitely many). The basis is not unique! 
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Example7 (Important) The columns of every invertible n by n matrix give a basis for R”: 


Invertible matrix 1 0 0 Singular matrix 1 0 1 
Independent columns A=]1 1 O Dependent columns B=]|1 1 2 
Column space is R? 1 1 1] Column space 4 R? 1 1 2 


The only solution to Ax = 0 is x = A710 = 0. The columns are independent. They span 
the whole space R”—because every vector b is a combination of the columns. Ax = b can 
always be solved by x = A~‘b. Do you see how everything comes together for invertible 
matrices? Here it is in one sentence: 


"The: vectors Dys. siy i basis for rR” exactis when they. are: the columns ofa an ü by 
y` y: 


R invertible. matrix c. Thus R S ‘has infinitely many different bases. 


When the columns are dependent, we keep only the pivot columns—the first two columns 
of B above, with its two pivots. They are independent and they span the column space. 


The pivot. columns o Aare ¢ a. basis for its column: pa é. The pivot. rows of Ai are a a basis 
for its TOW space. S ) are the pivot rows of its echelo form Ro 


Example 8 This matrix is not invertible. Its columns are not a basis for anything! 


One pivot column _ [2 4 _ [1 2 
One pivot row (r = 1) A É 4 reduces to R = lo ol 


Column 1 of A is the pivot column. That column alone is a basis for its column space. 
The second column of A would be a different basis. So would any nonzero multiple of that 
column. There is no shortage of bases. One definite choice is the pivot columns. 

Notice that the pivot column (1,0) of this R ends in zero. That column is a basis for 
the column space of R, but it doesn’t belong to the column space of A. The column spaces 
of A and R are different. Their bases are different. (Their dimensions are the same.) 

The row space of A is the same as the row space of R. It contains (2, 4) and (1, 2) and 
all other multiples of those vectors. As always, there are infinitely many bases to choose 
from. One natural choice is to pick the nonzero rows of R (rows with a pivot). So this 
matrix A with rank one has only one vector in the basis: 


Basis for the column space: H . Basis for the row space: | | . 


The next chapter will come back to these bases for the column space and row space. We 
are happy first with examples where the situation is clear (and the idea of a basis is still 
new). The next example is larger but still clear. 


Example 9 Find bases for the column and row spaces of this rank two matrix: 


1 2 0 3 
R=/0 0 1 4 
00 0 0 
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Columns 1 and 3 are the pivot columns. They are a basis for the column space (of R!). 
The vectors in that column space all have the form b = (x, y,0). The column space of 
R is the “xy plane” inside the full 3-dimensional xyz space. That plane is not R?, it is a 
subspace of R°. Columns 2 and 3 are also a basis for the same column space. Which pairs 
of columns of R are not a basis for its column space? 

The row space of R is a subspace of R*. The simplest basis for that row space is the 
two nonzero rows of R. The third row (the zero vector) is in the row space too. But it is 
not in a basis for the row space. The basis vectors must be independent. 


First answer Make them the rows of A, and eliminate to find the nonzero rows of R. 
Second answer Put the five vectors into the columns of A. Eliminate to find the pivot 
columns (of A not R). The program colbasis uses the column numbers from pivcol. 

Could another basis have more vectors, or fewer? This is a crucial question with a good 
answer: No. All bases for a vector space contain the same number of vectors. 


The number of vectors, in any and every basis, is the “dimension” of the space. 


Dimension of a Vector Space 


We have to prove whaf was just stated. There are many choices for the basis vectors, but 
the number of basis vectors doesn’t change. 


Proof Suppose that there are more w’s than v’s. From n > m we want to reach a con- 
tradiction. The v’s are a basis, so wı must be a combination of the v’s. If w; equals 
41101 +++ + amı Um, this is the first column of a matrix multiplication VA: 


Each w is a ail Aln 

combination W = | w, w2 ... Wy | =| V1 ... Um : : = VA. 
h bd 

of the v’s Ami Amn 


We don’t know each aj;;, but we know the shape of A (it is m by n). The second vector 
w% is also a combination of the v’s. The coefficients in that combination fill the second 
column of A. The key is that A has a row for every v and a column for every w. A isa 
short wide matrix, since we assumed n > m. So Ax = 0 has a nonzero solution. 

Ax = 0 gives VAx = 0 whichis Wx = 0. A combination of the w’s gives zero! Then 
the w’s could not be a basis—our assumption n > m is not possible for two bases. 

If m > n we exchange the v’s and w’s and repeat the same steps. The only way to 
avoid a contradiction is to have m = n. This completes the proof that m = n. 


The number of basis vectors depends on the space—not on a particular basis. The 
number is the same for every basis, and it counts the “degrees of freedom” in the space. 
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The dimension of the space R” is n. We now introduce the important word dimension 
for other vector spaces too. 


, The dimension of a space is the number of vectors in every basis. 


This matches our intuition. The line through v = (1,5, 2) has dimension one. It is a sub- 
space with this one vector v in its basis. Perpendicular to that line is the plane 
x + 5y + 2z = 0. This plane has dimension 2. To prove it, we find a basis (—5, 1, 0) 
and (—2,0, 1). The dimension is 2 because the basis contains two vectors. 

The plane is the nullspace of the matrix A = [ 15 2 ], which has two free variables. 
Our basis vectors (—5, 1,0) and (—2,0, 1) are the “special solutions” to Ax = 0. The 
next section shows that the n — r special solutions always give a basis for the nullspace. 
C (A) has dimension r and the nullspace N (A) has dimension n — r. 


Note about the language of linear algebra We never say “the rank of a space” or “the 
dimension of a basis” or “the basis of a matrix”. Those terms have no meaning. It is the 
dimension of the column space that equals the rank of the matrix. 


Bases for Matrix Spaces and Function Spaces 


The words “independence” and “basis” and “dimension” are not at all restricted to column 
vectors. We can ask whether three matrices 41, A2, A3 are independent. When they are in 
the space of all 3 by 4 matrices, some combination might give the zero matrix. We can also 
ask the dimension of the full 3 by 4 matrix space. (It is 12.) 

In differential equations, d*y/dx* = y has a space of solutions. One basis is y = e” 
and y = e*. Counting the basis functions gives the dimension 2 for the space of all 
solutions. (The dimension is 2 because of the second derivative.) 

Matrix spaces and function spaces may look a little strange after R”. But in some 
way, you haven’t got the ideas of basis and dimension straight until you can apply them to 
“vectors” other than column vectors. 


Matrix spaces The vector space M contains all 2 by 2 matrices. Its dimension is 4. 


One basis is Aradas Aa =| 9 ol le Ape ollo i| 


Those matrices are linearly independent. We are not looking at their columns, but at the 
whole matrix. Combinations of those four matrices can produce any matrix in M, so they 
span the space: 


Every A combines 


-Ia 2] _ 
the basis matrices c1A1 + C22 + C343 + Cag = | | = A 


ĉ3 ĉ4 


A is zero only if the c’s are all zero—this proves independence of Aj, Az, A3, Ag. 
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The three matrices A,,A2,A4 are a basis for a subspace—the upper triangular 
matrices. Its dimension is 3. A; and 44 are a basis for the diagonal matrices. What is 
a basis for the symmetric matrices? Keep A, and A4, and throw in A2 + A3. 

To push this further, think about the space of all n by n matrices. One possible basis 
uses matrices that have only a single nonzero entry (that entry is 1). There are n? positions 
for that 1, so there are n? basis matrices: 


The dimension of the whole n by n matrix space is 2. 


The dimension of the subspace of upper triangular matrices is $n? + in. 
The dimension of the subspace of diagonal matrices is n. 


The dimension of the subspace of symmetric matrices is in? + in (why ?). 


Function spaces The equations d*y/dx* = 0 and d?y/dx? = —y and d?y/dx? = y 
involve the second derivative. In calculus we solve to find the functions y(x): 


y” =0 _ is solved by any linear function y = cx + d 
y” = —y is solved by any combination y = c sinx + d cos x 
y” =y is solved by any combination y = ce* + de™*. 


That solution space for y” = —y has two basis functions: sinx and cosx. The space 
for y” = 0 has x and 1. It is the “nullspace” of the second derivative! The dimension is 2 
in each case (these are second-order equations). 

The solutions of y” = 2 don’t form a subspace—the right side b = 2 is not zero. A 
particular solution is y(x) = x?. The complete solution is y(x) = x? + cx + d. All 
those functions satisfy y” = 2. Notice the particular solution plus any function cx + d 
in the nullspace. A linear differential equation is like a linear matrix equation Ax = b. 
But we solve it by calculus instead of linear algebra. 

We end here with the space Z that contains only the zero vector. The dimension of this 
space is zero. The empty set (containing no vectors) is a basis for Z. We can never allow 
the zero vector into a basis, because then linear independence is lost. 


= REVIEW OF THE KEY IDEAS = 


1. The columns of A are independent if x = 0 is the only solution to Ax = 0. 
2. The vectors v1,...,U, span a space if their combinations fill that space. 


3. A basis consists of linearly independent vectors that span the space. Every vector 
in the space is a unique combination of the basis vectors. 


4. All bases for a space have the same number of vectors. This number of vectors in a 
basis is the dimension of the space. 


5. The pivot columns are one basis for the column space. The dimension is r. 
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=u WORKED EXAMPLES #8 


3.5 A Start with the vectors vı = (1,2,0) and v2 = (2,3,0). (a) Are they linearly 
independent? (b) Are they a basis for any space? (c) What space V do they span? 
(d) What is the dimension of V? (e) Which matrices A have V as their column space? 
(£) Which matrices have V as their nullspace? (g) Describe ali vectors v3 that complete 
a basis v1, V2, V3 for R?. 


Solution 
(a) vı and v2 are independent—the only combination to give 0 is Ov, + 0v2. 
(b) Yes, they are a basis for the space they span. 
(c) That space V contains all vectors (x, y, 0). It is the xy plane in R*. 
(d) The dimension of V is 2 since the basis contains two vectors. 


(e) This V is the column space of any 3 by n matrix A of rank 2, if every column is a 
combination of v; and v2. In particular A could just have columns v; and v2. 


(f) This V is the nullspace of any m by 3 matrix B of rank 1, if every row is a multiple 
of (0, 0, 1). In particular take B = [0 0 1]. Then Bu; = Oand Buz = 0. 


(g) Any third vector v3 = (a, b,c) will complete a basis for R? provided c Æ 0. 


3.5 B Start with three independent vectors w1, w2, w3. Take combinations of those 
vectors to produce v1, v2, v3. Write the combinations in matrix form as V = WM: 


vi =W,+ Wo 1 1 0 
V2 =w,+2w2+ w3 whichis V1 V2 V3| = |W, W2 W3 12 1 
3 = W2+cw3 0 1 c 


What is the test on a matrix V to see if its columns are linearly independent? If c 1 show 
that v1, V2, V3 are linearly independent. If c = 1 show that the v’s are linearly dependent. 


Solution The test on V for independence of its columns was in our first definition: 
The nullspace of V must contain only the zero vector. Then x = (0,0,0) is the only 
combination of the columns that gives Vx = zero vector. 

Ifc = 1 in our problem, we can see dependence in two ways. First, vı + v3 will be 
the same as v2. (If you add w; + w2 to w2 + w3 you get wy + 2w2 + w3 which is v2.) 
In other words vı — v2 + v3 = 0—which says that the v’s are not independent. 

The other way is to look at the nullspace of M. If c = 1, the vector x = (1,—1, 1) isin 
that nullspace, and Mx = 0. Then certainly WM x = 0 which is the same as Vx = 0. So 
the v’s are dependent. This specific x = (1,—1, 1) from the nullspace tells us again that 
V1 —- v: +03 = 0. 
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Now suppose c Æ 1. Then the matrix M is invertible. So if x is any nonzero vector we 
know that M x is nonzero. Since the w’s are given as independent, we further know that 
WM x is nonzero. Since V = WM, this says that x is not in the nullspace of V. In other 
words v1, V2, V3 are independent. 

The general rule is “independent v’s from independent w’s when M is invertible”. 
And if these vectors are in RÌ, they are not only independent—they are a basis for RÌ, 
“Basis of v’s from basis of w’s when the change of basis matrix M is invertible.” 


3.5C (Important example) Suppose vı, ...,Uy, is a basis for R” and the n by n matrix 


A is invertible. Show that Av;,..., AUy is also a basis for R”. 
Solution In matrix language: Put the basis vectors v1,...,U, in the columns of an 
invertible(!) matrix V. Then Av1,..., Av, are the columns of AV. Since A is invertible, 


so is AV and its columns give a basis. 

In vector language: Suppose cy Av, + =- + ¢,AU, = 0. This is Av = 0 with 
v = c1V1 +- -+Ca Un. Multiply by A! to reach v = 0. By linear independence of the v’s, 
all c; = 0. This shows that the Av’s are independent. 

To show that the Av’s span R”, solve ci Avı + +--+ Cn AUV, = b which is the same as 
C101 +++ +c,0, = A ‘Db. Since the v’s are a basis, this must be solvable. 


Problem Set 3.5 


Questions 1-10 are about linear independence and linear dependence. 


1 Show that v1, v2, v3 are independent but v1, v2, v3, v4 are dependent: 
1 1 1 2 
vi = 0 v2 = 1 v3 = 1 V4 = 3 
0 0 1 4 


Solve cy v4 + C2U2 + 6303 +404 = Vor Ax = 0. The v’s go in the columns of A. 


2 (Recommended) Find the largest possible number of independent vectors among 


1 1 1 0 0 0 
-1 0 1 1 0 

v=] o| P= Jy] B=] of} %3] ] 53| of %5] 1 
0 0 —] 0 | -| 


3 Prove that if a = 0 ord = Oor f = 0 (3 cases), the columns of U are dependent: 


a b c 
U=ļ|O0 d e 
00 f 
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4 If a,d, f in Question 3 are all nonzero, show that the only solution to Ux = 0 is 
x = 0. Then the upper triangular U has independent columns. 


5 Decide the dependence or independence of 


(a) the vectors (1,3, 2) and (2, 1,3) and (3, 2, 1) 
(b) the vectors (1, —3, 2) and (2, 1, —3) and (—3, 2, 1). 


6 Choose three independent columns of U. Then make two other choices. Do the same 
for A, 


U= 


AOON 
NON Ww 
oon & 
NO Oo m 


23 4 1 
0 6 7 0 
000 9 
0 0 0 0 


7 If wi, W2, w3 are independent vectors, show that the differences vı = w2 — w3 and 
v2 = W1 — W3 and v3 = W1 — W2 are dependent. Find a combination of the v’s that 
gives zero. Which matrix A in [v1 v2 v3] =[w) w2 wz] A is singular? 


8 If w,, W2, w3 are independent vectors, show that the sums v; = wz + w3 and 
v2 = W1 + w3 and v3 = w; + w2 are independent. (Write c, v1 +c2¥2+¢303 = 9 
in terms of the w’s. Find and solve equations for the c’s, to show they are zero.) 


9 Suppose v1, V2, V3, V4 are vectors in R?. 


(a) These four vectors are dependent because 
(b) The two vectors vı and v2 will be dependent if 
(c) The vectors vı and (0,0, 0) are dependent because 


10 Find two independent vectors on the plane x +2y—3z—t = 0 in R4. Then find three 
independent vectors. Why not four? This plane is the nullspace of what matrix? 


Questions 11-15 are about the space spanned by a set of vectors. Take all linear com- 
binations of the vectors. ` 


11 Describe the subspace of R? (is it a line or plane or Rĉ?) spanned by 
(a) the two vectors (1,1, —1} and (—1, —1, 1) 
(b) the three vectors (0, 1, 1) and (1, 1, 0) and (0, 0, 0) 
(c) all vectors in R? with whole number components 


(d) all vectors with positive components. 


12 The vector b is in the subspace spanned by the columns of A when has a 
solution. The vector ¢ is in the row space of A when has a solution. 


True or false: If the zero vector is in the row space, the rows are dependent. 
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13 


14 
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Find the dimensions of these 4 spaces. Which two of the spaces are the same? (a) col- 
umn space of A, (b) column space of U, (c) row space of A, (d) row space of U: 


i 1 0 i i 0 
A=|1 3 1 and U=/0 2 1 
3 1 -l 00 0 


v + w and v — w are combinations of v and w. Write v and w as combinations of 
v + w and v — w. The two pairs of vectors the same space. When are they a 
basis for the same space? 


Questions 15-25 are about the requirements for a basis. 


15 


16 


17 


18 


19 


20 


21 


If v1,...,U, are linearly independent, the space they span has dimension 
These vectors are a for that space. If the vectors are the columns of an m by 
n matrix, then m is than n. If m = n, that matrix is 


Find a basis for each of these subspaces of R*: 


(a) All vectors whose components are equal. 

(b) All vectors whose components add to zero. 

(c) All vectors that are perpendicular to (1, 1, 0,0) and (1,0, 1, 1). 
(d) The column space and the nullspace of J (4 by 4). 


Find three different bases for the column space of U = [19191]. Then find two 
different bases for the row space of U. 


Suppose v1, V2,..., V6 are six vectors in R*, 


(a) Those vectors (do)(do not)(might not) span Rê. 
(b) Those vectors (are)(are not)(might be) linearly independent. 
(c) Any four of those vectors (are)(are not)(might be) a basis for R4. 
The columns of A are n vectors from R”. If they are linearly independent, what is 


the rank of A? If they span R”, what is the rank? If they are a basis for R”, what 
then? Looking ahead: The rank r counts the number of columns. 


Find a basis for the plane x—2y +3z = 0 in R°. Then find a basis for the intersection 
of that plane with the xy plane. Then find a basis for all vectors perpendicular to the 
plane. 


Suppose the columns of a 5 by 5 matrix A are a basis for R°. 


(a) The equation Ax = 0 has only the solution x = 0 because 
(b) If b is in RÎ then Ax = b is solvable because the basis vectors R>. 


Conclusion: A is invertible. Its rank is 5. Its rows are also a basis for R>. 
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22 Suppose S is a 5-dimensional subspace of RÉ. True or false (example if false): 


(a) Every basis for S can be extended to a basis for RÊ by adding one more vector. 


(b) Every basis for R® can be reduced to a basis for S by removing one vector. 


23 U comes from A by subtracting row 1 from row 3: 


1 3 2 1 3 2 
A=|0 1 i and U=j|0 1 1 
1 3 2 0 0 0 


Find bases for the two column spaces. Find bases for the two row spaces. Find bases 
for the two nullspaces. Which spaces stay fixed in elimination? 


24 True or false (give a good reason): 


(a) If the columns of a matrix are dependent, so are the rows. 
(b) The column space of a 2 by 2 matrix is the same as its row space. 
(c) The column space of a 2 by 2 matrix has the same dimension as its row space. 


(d) The columns of a matrix are a basis for the column space. 


25 For which numbers c and d do these matrices have rank 2? 


Questions 26-30 are about spaces where the “vectors” are matrices. 
26 Find a basis (and the dimension) for each of these subspaces of 3 by 3 matrices: 


(a) All diagonal matrices. 
(b) All symmetric matrices (AT = A). 


(c) All skew-symmetric matrices (A? = —A). 
27 Construct six linearly independent 3 by 3 echelon matrices Uj, ..., U6- 


28 Find a basis for the space of all 2 by 3 matrices whose columns add to zero. Find a 
basis for the subspace whose rows also add to zero. 


29 What subspace of 3 by 3 matrices is spanned (take all combinations) by 


(a) the invertible matrices? 
(b) the rank one matrices? 


(c) the identity matrix? 


30 Find a basis for the space of 2 by 3 matrices whose nullspace contains (2, 1, 1). 
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Questions 31-35 are about spaces where the “vectors” are functions. 
31 (a) Find all functions that satisfy a = 0. 
(b) Choose a particular function that satisfies ay = 3, 


(c) Find all functions that satisfy ay = 3. 


32 The cosine space F; contains all combinations y(x) = A cos x+Bcos2x+C cos 3x. 
Find a basis for the subspace with y(0) = 0. 


33 Find a basis for the space of functions that satisfy 


34 Suppose yı(x), yo(x), y3(x) are three different functions of x. The vector space 
they span could have dimension 1, 2, or 3. Give an example of y1, y2, y3 to show 
each possibility. 


35 Find a basis for the space of polynomials p(x) of degree < 3. Find a basis for the 
subspace with p(1) = 0. 


36 Find a basis for the space S of vectors (a,b,c,d) witha +c + d = 0 and also for 
the space T with a + b = 0 and c = 2d. What is the dimension of the intersection 
SAT? 


37 IfAS = SA for the shift matrix S, show that A must have this special form: 


a b c 0 1 0 0lOjja b c abe 
If|d e fFIlJO 0 11/=]/001]/d e fF] thnA=|0 a b 
g hi 0 0 0 O00;]|¢g A i 0 0a 


“The subspace of matrices that commute with the shift S has dimension 2 


38 Which of the following are bases for RÌ? 


(a) (1, 2,0) and (0, 1,—1) 

(b) (1, 1,—1), (2,3, 4), (4, 1,-1), 0, 1,—1) 
(c) (1,2, 2), (—1, 2, 1), (0, 8, 0) 

(d) (1,2,2), (—1,2, 1), (0, 8, 6) 


39 Suppose A is 5 by 4 with rank 4. Show that Ax = b has no solution when the 5 by 5 
matrix [A 6] is invertible. Show that Ax = b is solvable when [A 6] is singular. 


40 (a) Find a basis for all solutions to d+ y/dx* = y(x). 
(b) Find a particular solution to d+ y/dx* = y(x) + 1. Find the complete solution. 
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41 


42 


43 


44 


45 


46 


Challenge Problems 


Write the 3 by 3 identity matrix as a combination of the other five permutation 
matrices! Then show that those five matrices are linearly independent. (Assume a 
combination gives cı P1 + + + c5 P; = zero matrix, and check entries to prove c; 
is zero.) The five permutations are a basis for the subspace of 3 by 3 matrices with 
row and column sums all equal. 


Choose x = (x1, X2, X3, X4) in R4. It has 24 rearrangements like (x2, X1, X3, X4) 
and (x4, X3, X1, X2). Those 24 vectors, including x itself, span a subspace S. Find 
specific vectors x so that the dimension of S is: (a) zero, (b) one, (c) three, (d) four. 


Intersections and sums have dim(V) + dim(W) = dim(V N W) + dim(V + W). 
Start with a basis #1,..., ur for the intersection VM W. Extend with vj1,..., Us 
to a basis for V, and separately with w,,..., w; to a basis for W. Prove that the w’s, 
v’s and w’s together are independent. The dimensions have (r + s) + r +t) = 
(r) + (r +s + t) as desired. 


Mike Artin suggested a neat higher-level proof of that dimension formula in Prob- 
lem 43. From all inputs v in V and w in W, the “sum transformation” produces v+ w. 
Those outputs fill the space V + W. The nullspace contains all pairs v = u, w = —u 
for vectors u in V N W. (Then v +w = u—u = 0.) So dim(V + W) + dim(V N W) 
equals dim(V) + dim(W) (input dimension from V and W) by the crucial formula 


dimension of outputs + dimension of nullspace = dimension of inputs. 


Problem For an m by n matrix of rank r, what are those 3 dimensions? Outputs = 
column space. This question will be answered in Section 3.6, can you do it now? 


Inside R”, suppose dimension (V) + dimension (W) > n. Show that some nonzero 
vector is in both V and W. 


Suppose A is 10 by 10 and A? = 0 (zero matrix). This means that the column space 
of A is contained in the . If A has rank r, those subspaces have dimension 
r <10—r. So the rank isr < 5. 


(This problem was added to the second printing: If A? = 0 it says that r < n/2.) 
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3.6 Dimensions of the Four Subspaces 


The main theorem in this chapter connects rank and dimension. The rank of a matrix 
is the number of pivots. The dimension of a subspace is the number of vectors in a basis. 
We count pivots or we count basis vectors. The rank of A reveals the dimensions of 
all four fundamental subspaces. Here are the subspaces, including the new one. 

Two subspaces come directly from A, and the other two from AT: 


Four Fundamental Subspaces. © 


In this book the column space and nullspace came first. We know C (A) and N(A) pretty 
well. Now the other two subspaces come forward. The row space contains all combinations 
of the rows. This is the column space of A". 

For the left nullspace we solve A! y = 0—that system is n by m. This is the nullspace 
of A’. The vectors y go on the left side of A when the equation is written as y'A = 0". The 
matrices A and AT are usually different. So are their column spaces and their nullspaces. 
But those spaces are connected in an absolutely beautiful way. 

Part 1 of the Fundamental Theorem finds the dimensions of the four subspaces. One 
fact stands out: The row space and column space have the same dimension r (the rank of 
the matrix). The other important fact involves the two nullspaces: 


N(A) and N(A’) have dimensions n — r and m — r, to make up the full n and m. 


Part 2 of the Fundamental Theorem will describe how the four subspaces fit together 
(two in R” and two in R”). That completes the “right way” to understand every Ax = b. 
Stay with it—you are doing real mathematics. 


The Four Subspaces for R 


Suppose A is reduced to its row echelon form R. For that special form, the four subspaces 
are easy to identify. We will find a basis for each subspace and check its dimension. Then 
we watch how the subspaces change (two of them don’t change!) as we look back at A. 
The main point is that the four dimensions are the same for A and R. 

As a specific 3 by 5 example, look at the four subspaces for the echelon matrix R: 


m=3 13 5 07 pivot rows 1 and 2 
n=5 0001 2 
r=2 000 0 0 pivot columns 1 and 4 


The rank of this matrix R is r = 2 (two pivots). Take the four subspaces in order. 
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1.. The row space of R has dimension 2, matching the rank, = 


Reason: The first two rows are a basis. The row space contains combinations of all three 
rows, but the third row (the zero row) adds nothing new. So rows | and 2 span the row 
space C (RT). 

The pivot rows 1 and 2 are independent. That is obvious for this example, and it is 
always true. If we look only at the pivot columns, we see the r by r identity matrix. 
There is no way to combine its rows to give the zero row (except by the combination with 
all coefficients zero). So the r pivot rows are a basis for the row space. 


The dimension of the row space is the rank r. The nonzero rows of R form a basis. 


2. The column s 


so has dimension ¢ = 2. 


Reason: The pivot columns 1 and 4 form a basis for C (R). They are independent because 
they start with the r by r identity matrix. No combination of those pivot columns can give 
the zero column (except the combination with all coefficients zero). And they also span the 
column space. Every other (free) column is a combination of the pivot columns. Actually 
the combinations we need are the three special solutions ! 


Column 2 is 3 (column 1). The special solution is (—3, 1, 0, 0, 0). 
Column 3 is 5 (column 1). The special solution is (—5, 0, 1,0, 0, ). 
Column 5 is 7 (column 1) + 2 (column 4). That solution is (—7,0,0,—2, 1). 


The pivot columns are independent, and they span, so they are a basis for C (R). 


The dimension of the column space is the rank r. The pivot columns form a basis. 


—3 —5 -7 
1 0 0 Rx = Ohas the 
So=]| 0 S3 = 1 Ss=] 0 complete solution 
0 0 —2 X = X282 + X383 + X555 
0 0 1 


There is a special solution for each free variable. With n variables and r pivot variables, 
that leaves n — r free variables and special solutions. N(R) has dimension n — F. 
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The nullspace has dimension n — r. The special solutions form a basis. 


The special solutions are independent, because they contain the identity matrix in rows 2, 3, 
5. All solutions are combinations of special solutions, x = x252 + x353 + x555, because 
this puts x2, x3 and xs in the correct positions. Then the pivot variables x; and x4 are 
totally determined by the equations Rx = 0. 


Reason: The equation R' y = 0 looks for combinations of the columns of RT (the rows 
of R) that produce zero. This equation R'y = 0 or yTR = 0° is 


yıll, 3, 5, 0, 7] 


+y2[0, 0, 0, 1, 2] 
Left nullspace +y3{0, 0, 0, 0, 0] (1) 
[0, 0, 0, 0, 0] 


The solutions y1, y2, y3 are pretty clear. We need yı = 0 and y2 = 0. The variable y3 is 
free (it can be anything). The nullspace of RT contains all vectors y = (0,0, y3). It is the 
line of all multiples of the basis vector (0, 0, 1). 

In all cases R ends with m — r zero rows. Every combination of these m — r rows 
gives zero. These are the only combinations of the rows of R that give zero, because the 
pivot rows are linearly independent. The left nulispace of R contains ail these solutions 
y= (0, --- Oe Ts Ym) to R'y = 0. 


If Ais m by n of rank r, its left nullspace has dimension m — r. 


To produce a zero combination, y must start with r zeros. This leaves dimension m — r. 

Why is this a “left nullspace”? The reason is that RTy = 0 can be transposed to 
y™R = 0°. Now y7 is a row vector to the left of R. You see the y’s in equation (1) 
multiplying the rows. This subspace came fourth, and some linear algebra books omit 
it—but that misses the beauty of the whole subject. 


Ae 


So far this is proved for echelon matrices R. Figure 3.5 shows the same for A. 


The Four Subspaces for A 


We have a job still to do. The subspace dimensions for A are the same as for R. 
The job is to explain why. A is now any matrix that reduces to R = rref(A). 


13 5 0 7 
A reduces to R A=| 000 1 2 Notice C (A) 4 C(R) (2) 
13 51 9 
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C (A) 
dim r 


row space 


column space 
all ATy 


all Ax 


The big picture 


left nullspace 
nullspace Aly=0 
Ax=0 


N (A) 
dimension n r 


N (AT) 
dimension m r 


Figure 3.5: The dimensions of the Four Fundamental Subspaces (for R and for A). 


An elimination matrix takes A to R. The big picture (Figure 3.5) applies to both. The 
invertible matrix E is the product of the elementary matrices that reduce A to R: 


AtoRandback EA=R and A=E™'!R (3) 
1 A has the same row space as R . Same dimension r and same basis. 


Reason: Every row of A is a combination of the rows of R. Also every row of R is a 
combination of the rows of A. Elimination changes rows, but not row spaces. 

Since A has the same row space as R, we can choose the first r rows of R as a basis. 
Or we could choose r suitable rows of the original A. They might not always be the first r 
rows of A, because those could be dependent. The good r rows of A are the ones that end 
up as pivot rows in R. . 


2 The column space of A has dimension r. For every matrix this is essential: 
The number of independent columns equals the number of independent rows. 


Wrong reason: “A and R have the same column space.” This is false. The columns of 
R often end in zeros. The columns of A don’t often end in zeros. The column spaces are 
different, but their dimensions are the same—equal to r. 


Right reason: The same combinations of the columns are zero (or nonzero) for A and R. 
Say that another way: Ax = 0 exactly when Rx = 0. The r pivot columns (of both) are 
independent. 


Conclusion Ther pivot columns of A are a basis for its column space. 
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3 A has the same nullspace as R. Same dimension n — r and same basis. 


Reason: The elimination steps don’t change the solutions. The special solutions are a ba- 
sis for this nullspace (as we always knew). There are n — r free variables, so the dimension 
of the nullspace is n — r. Notice that r + (n — r) equals z: 


4 The left nullspace of A (the nullspace of AT) has dimension m ~r. 
Reason: A! is just as good a matrix as A. When we know the dimensions for every A, 
we also know them for AT. Its column space was proved to have dimension r. Since AT is 


n by m, the “whole space” is now R”. The counting rule for A wasr + (n — r) = n. The 
counting rule for AT is r + (m — r) = m. We now have all details of the main theorem: 


he column space andr row space both have dimension ro 
ey The nullspaces have dimensions | n=r and m-=r. 


Me 


By concentrating on spaces of vectors, not on individual numbers c or vectors, we get these 
clean rules. You will soon take them for granted—eventually they begin to look obvious. 
But if you write down an 11 by 17 matrix with 187 nonzero entries, I don’t think most 
people would see why these facts are true: 


dimension of C (A) = dimension of C (AT) = rank of A 
dimension of C (A) + dimension of N (A) = 17. 


Example1 A=[1 2 3] has m=1 and n =3 andrank r =1. 


Two key facts 


The row space is a line in R*. The nullspace is the plane Ax = x, + 2x2 + 3x3 = 0. This 
plane has dimension 2 (which is 3 — 1). The dimensions add to 1 + 2 = 3. 

The columns of this 1 by 3 matrix are in RI! The column space is all of R!. The left 

nullspace contains only the zero vector. The only solution to ATy = 0 is y = 0, no other 
multiple of [1 2 3] gives the zero row. Thus N (AT) is Z, the zero space with dimension 
0 (which is m — r). In R” the dimensions add to 1 + 0 = 1. 
1 2 3 
2 4 6 
The row space is the same line through (1,2,3). The nullspace must be the same plane 
xı + 2x2 + 3x3 = 0. Their dimensions still add to 1 + 2 = 3. 

All columns are multiples of the first column (1,2). Twice the first row minus the 
second row is the zero row. Therefore AT y = 0 has the solution y = (2, —1). The column 
space and left nullspace are perpendicular lines in R*. Dimensions 1 + 1 = 2. 


Example 2 a=] | has m = 2 with n = 3 and rank r = 1. 


Column space = line through | | Left nullspace = line through K . 
If A has three equal rows, its rank is . What are two of the y’s in its left nullspace? 


The y’s in the left nullspace combine the rows to give the zero row. 
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Matrices of Rank One 


That last example had rank r = 1—and rank one matrices are special. We can describe 
them all. You will see again that dimension of row space = dimension of column space. 
When r = 1, every row is a multiple of the same row: 


1 2 3 1 
Az=uv' A= 3 > s equals |_| times [1 2 3]=0'. 
0 0 0 0 


A column times a row (4 by 1 times 1 by 3) produces a matrix (4 by 3). All rows are multi- 
ples of the row (1,2,3). All columns are multiples of the column (1, 2,—3,0). 
The row space is a line in R”, and the column space is a line in R”. 


The columns are multiples of u. The rows are multiples of v”. The nullspace is the plane 
perpendicular to v. (Ax = 0 means that u(v'x) = 0 and then v'x = 0.) It is this 
perpendicularity of the subspaces that will be Part 2 of the Fundamental Theorem. 


= REVIEW OF THE KEY IDEAS ns 


. The r pivot rows of R are a basis for the row spaces of R and A (same space). 


. The r pivot columns of A (!) are a basis for its column space. 


1 
2 
3. The n — r special solutions are a basis for the nullspaces of A and R (same space). 
4. The last m — r rows of J are a basis for the left nullspace of R. 

5 


. The last m — r rows of E are a basis for the left nullspace of A. 


Note about the four subspaces The Fundamental Theorem looks like pure algebra, but it 
has very important applications. My favorites are the networks in Chapter 8 (often 
I go there for my next lecture). The equation for y in the left nullspace is Ay = 0: 


Fiow into a node equais flow out, Kirchhoff’s Current Law is the “balance equation”. 


This is (in my opinion) the most important equation in applied mathematics. All models in 
science and engineering and economics involve a balance—of force or heat flow or charge 
or momentum or money. That balance equation, plus Hooke’s Law or Ohm’s Law or some 
law connecting “potentials” to “flows”, gives a clear framework for applied mathematics. 

My textbook on Computational Science and Engineering develops that framework, 
together with algorithms to solve the equations: Finite differences, finite elements, 
spectral methods, iterative methods, and multigrid. 
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= WORKED EXAMPLES n" 


3.6 A Find bases and dimensions for all four fundamental subspaces if you know that 


100 13 05 
A=|]2 1 0 00 1 6|=LU=E7!R. 
5 01 000 0 


By changing only one number in R, change the dimensions of ail four subspaces. 


Solution This matrix has pivots in columns 1 and 3. Its rank is r = 2. 


Row space Basis (1,3,0, 5) and (0, 0, 1, 6) from R. Dimension 2. 
Column space Basis (1, 2,5) and (0, 1,0) from E~! (and A). Dimension 2. 
Nullspace Basis (—3, 1,0,0) and (—5, 0, —6, 1) from R. Dimension 2. 


Nullspace of AT Basis (—5, 0, 1) from row 3 of E. Dimension 3 — 2 = 1. 


We need to comment on that left nullspace N (AT). EA = R says that the last row of E 
combines the three rows of A into the zero row of R. So that last row of E is a basis vector 
for the left nullspace. If R had two zero rows, then the last two rows of E would be a basis. 
(Just like elimination, y' A = 0' combines rows of A to give zero rows in R.) 

To change all these dimensions we need to change the rank r. One way to do that is to 
change an entry (any entry) in the zero row of R. 


3.6B Put four 1’s into a 5 by 6 matrix of zeros, keeping the dimension of its row space 
as small as possible. Describe all the ways to make the dimension of its column space as 
small as possible. Describe all the ways to make the dimension of its nullspace as small as 
possible. How to make the sum of the dimensions of all four subspaces small? 


Solution The rank is 1 if the four 1’s go into the same row, or into the same column. 
They can also go into two rows and two columns (so aji = aij = Aji = ajj = 1). 
Since the column space and row space always have the same dimensions, this answers the 
first two questions: Dimension 1. 

The nullspace has its smallest possible dimension 6 — 4 = 2 when the rank is r = 4. 
To achieve rank 4, the 1’s must go into four different rows and columns. 

You can’t do anything about the sum r + (n —r) +r + (m-—r) =n +m. It will be 
6 +5 = 11 no matter how the 1’s are placed. The sum is 11 even if there aren’t any 1’s... 


If all the other entries of A are 2’s instead of 0’s, how do these answers change? 


Problem Set 3.6 


1 (a) If a 7 by 9 matrix has rank 5, what are the dimensions of the four subspaces? 
What is the sum of all four dimensions? 
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(b) If a3 by 4 matrix has rank 3, what are its column space and left nullspace? 


2 Find bases and dimensions for the four subspaces associated with A and B: 
12 4 12 4 
‘=l; 4 s and s=f, 5 s|: 
3 Find a basis for each of the four subspaces associated with A: 
012 3 4 1 0 0||0 12 3 4 
A=!/0 12 4 6/=/1 1 O0};/0 00 1 2 
000 1 2 0 1 1 000 0 0 


4 Construct a matrix with the required property or explain why this is impossible: 


. 1 0 : 
(a) Column space contains | i, E |; row space contains | }], [4]. 


(b) Column space has basis $ | nullspace has basis pi 


(c) Dimension of nullspace = 1 + dimension of left nullspace. 


(d) Left nullspace contains [ } |, row space contains [3]. 


(e) Row space = column space, nullspace Æ left nullspace. 


5 If V is the subspace spanned by (1,1,1) and (2,1,0), find a matrix A that has 


V as its row space. Find a matrix B that has V as its nullspace. 


6 Without elimination, find dimensions and bases for the four subspaces for 


0 3 3 3 l 
A=|0 0 0 0 and B= {4 
0 1 0 1 5 


7 Suppose the 3 by 3 matrix A is invertible. Write down bases for the four subspaces 


for A, and also for the 3 by 6 matrix B =[A A]. 


8 What are the dimensions of the four subspaces for A, B, and C, if J is the 3 by 3 


identity matrix and 0 is the 3 by 2 zero matrix? 


A=[JI 0] and B=(4 J and C =[0]. 


oT of 


9 Which subspaces are the same for these matrices of different sizes? 


(a) [A] and Hi (b) H and $ Hi 


Prove that all three of those matrices have the same rank r. 
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15 


16 


17 
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If the entries of a 3 by 3 matrix are chosen randomly between 0 and 1, what are the 
most likely dimensions of the four subspaces? What if the matrix is 3 by 5? 


(Important) A is an m by n matrix of rank r. Suppose there are right sides b for 
which Ax = b has no solution. 


(a) What are all inequalities (< or <) that must be true between m, n, and r? 


(b) How do you know that AT y = 0 has solutions other than y = 0? 


Construct a matrix with (1,0, 1) and (1,2,0) as a basis for its row space and its 
column space. Why can’t this be a basis for the row space and nullspace? 


True or false (with a reason or a counterexample): 


(a) If m = n then the row space of A equals the column space. 
(b) The matrices A and —A share the same four subspaces. 


(c) If A and B share the same four subspaces then A is a multiple of B. 


Without computing A, find bases for its four fundamental subspaces: 


If you exchange the first two rows of A, which of the four subspaces stay the same? 
Ifv = (1, 2,3, 4) is in the left nullspace of A, write down a vector in the left nullspace 
of the new matrix. 


Explain why v = (1,0, —1) cannot be a row of A and also in the nullspace. 


Describe the four subspaces of R? associated with 


~ 10 10 1 
A=]ļ|O0 0 1 and [+A=]0 
0 0 0 0 


O = m= 
re = © 


(Left nullspace) Add the extra column b and reduce A to echelon form: 


123 db 12 3 b 
[A b]=|]4 5 6 b] > |0 -3 -6 b2-4b, 
7 8 9 bs 0 0 0 b3—-2b,+h 


A combination of the rows of A has produced the zero row. What combination is it? 
(Look at b3 — 2b2 + bı on the right side.) Which vectors are in the nullspace of AT 
and which are in the nullspace of A? 
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21 


22 


23 


24 


25 


Following the method of Problem 18, reduce A to echelon form and look at zero 
rows. The b column tells which combinations you have taken of the rows: 


ban) foe 
(a) |3 4 b (b) 2 
46 b 2 4 b 
3 2 5 bg 


From the b column after elimination, read off m—r basis vectors in the left nullspace. 
Those y’s are combinations of rows that give zero rows. 


(a) Check that the solutions to Ax = 0 are perpendicular to the rows: 


1 0 O;|4 2 0 1 
A=|2 1 0//0 O 1 3)=ER. 
3 4 1 00 0 0 


(b) How many independent solutions to AT y = 0? Why is yT the last row of E~!? 
Suppose A is the sum of two matrices of rank one: A = uv? + w2?, 


(a) Which vectors span the column space of A? 

(b) Which vectors span the row space of A? 

(c) The rankislessthan2if ___ orif _—_— 

(d) Compute A and its rank if u = z = (1,0,0) and v = w = (0,0, 1). 


Construct A = uv! + wz! whose column space has basis (1,2, 4), (2,2, 1) and 
whose row space has basis (1, 0), (1, 1). Write A as (3 by 2) times (2 by 2). 


Without multiplying matrices, find bases for the row and column spaces of A: 


How do you know from these shapes that A cannot be invertible? 


(Important) ATy = d is solvable when d is in which of the four subspaces? The 
solution y is unique when the contains only the zero vector. 


True or false (with a reason or a counterexample): 


(a) A and AT have the same number of pivots. 
(b) A and AT have the same left nullspace. 
(c) If the row space equals the column space then AT = A. 


(d) If AT = —A then the row space of A equals the column space. 
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(Rank of AB) If AB = C, the rows of C are combinations of the rows of . 
So the rank of C is not greater than the rank of . Since B™ A? = CT, the rank 
of C is also not greater than the rank of 


If a,b,c are given with a Æ 0, how would you choose d so that |: b ] has rank 1? 
Find a basis for the row space and nullspace. Show they are perpendicular! 


Find the ranks of the 8 by 8 checkerboard matrix B and the chess matrix C: 


1 O 1 0 1 


010 rnbgqgk bur 

01010101 PP Pp Pp Pp p Pp p 
B=]|1 0 101010 and C = four zero rows 

e.o. e > ‘ Pp Ppp p p Pp Pp 

0O 1 O0 10 i0 1 rnbgqgk bur 


The numbers r,n,b,q,k, p are all different. Find bases for the row space and left 
nullspace of B and C. Challenge problem: Find a basis for the nullspace of C. 


Can tic-tac-toe be completed (5 ones and 4 zeros in A) so that rank (A) = 2 but 
neither side passed up a winning move? 


Challenge Problems 


If A = uv’ is a 2 by 2 matrix of rank 1, redraw Figure 3.5 to show clearly the Four 
Fundamental Subspaces. If B produces those same four subspaces, what is the exact 
relation of B to A? 


M is the space of 3 by 3 matrices. Multiply every matrix X in M by 


1 -I 1 0 
A=|-l 1 0 |. Notice: Aj 1 | =1|]0 
0 -l if 1 0 


(a) Which matrices X lead to AX = zero matrix? 


(b) Which matrices have the form AX for some matrix X? 


(a) finds the “nullspace” of that operation AX and (b) finds the “column space”. 
What are the dimensions of those two subspaces of M? Why do the dimensions add 
to(n—r)+r=9? 


Suppose the m by n matrices A and B have the same four subspaces. If they are both 
in row reduced echelon form, prove that F must equal G: 


a[i E] [iS] 


Chapter 4 


Orthogonality 


4.1 Orthogonality of the Four Subspaces 


Two vectors are orthogonal when their dot product is zero: v - w = 0 or v'w = 0. This 
chapter moves to orthogonal subspaces and orthogonal bases and orthogonal matrices. 
The vectors in two subspaces, and the vectors in a basis, and the vectors in the columns, 
all pairs v will be orthogonal. Think ofa a? + b? = = chA for a right triangle with sides v and w. 


— lvl? + wi? = = Jo + wl. - 


T T 


The right side i is w + ww + wy). ‘This s equals v wy + ww when vw = w v = 0. 


Subspaces entered Chapter 3 to throw light on Ax = b. Right away we needed the 
column space (for b) and the nullspace (for x). Then the light turned onto A’, uncovering 
two more subspaces. Those four fundamental subspaces reveal what a matrix really does. 

A matrix multiplies a vector: A times x. At the first level this is only numbers. At 
the second level Ax is a combination of column vectors. The third level shows subspaces. 
But I don’t think you have seen the whole picture until you study Figure 4.2. It fits the 
subspaces together, to show the hidden reality of A times x. The 90° angles between 
subspaces are new—and we have to say what those right angles mean. 


The row space is perpendicular to the nullspace. Every row of A is perpendicular to 
every solution of Ax = 0. That gives the 90° angle on the left side of the figure. This 
perpendicularity of subspaces is Part 2 of the Fundamental Theorem of Linear Algebra. 

The column space is perpendicular to the nullspace of AT. When b is outside the 
column space—when we want to solve Ax = b and can’t do it—then this nullspace of 
AT comes into its own. It contains the error e = b — Ax in the “least-squares” solution. 
Least squares is the key application of linear algebra in this chapter. 


Part 1 of the Fundamental Theorem gave the dimensions of the subspaces. The row 
and column spaces have the same dimension r (they are drawn the same size). The two 
nullspaces have the remaining dimensions n — r and m — r. Now we will show that 
the row space and nullspace are orthogonal subspaces inside R”. 
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DEFINITION Two subspaces V and W of a vector space are orthogonal if every vector v 
in V is perpendicular to every vector w in W: 


Example 1 The floor of your room (extended to infinity) is a subspace V. The line where 
two walls meet is a subspace W (one-dimensional). Those subspaces are orthogonal. Every 
vector up the meeting line is perpendicular to every vector in the floor. 


Example 2 Two walls look perpendicular but they are not orthogonal subspaces! The 
meeting line is in both V and W—and this line is not perpendicular to itself. Two planes 
(dimensions 2 and 2 in R?) cannot be orthogonal subspaces. 

When a vector is in two orthogonal subspaces, it must be zero. It is perpendicular to 
itself. It is v and it is w, so vTw = 0. This has to be the zero vector. 


Ww 
l <> 
I 
4 v'iw #0 


orthogonal line and plane non-orthogonal planes 


Figure 4.1: Orthogonality is impossible when dim V +dim W > dimension of whole space. 


The crucial examples for linear algebra come from the fundamental subspaces. Zero is 
the only point where the nullspace meets the row space. More than that, the nullspace and 
row space of A meet at 90°. This key fact comes directly from Ax = 0: 


row m 0| = <=> (row m) -x is zero > oe 


The first equation says that row 1 is perpendicular to x. The last equation says that row m is 
perpendicular to x. Every row has a zero dot product with x. Then x is also perpendicular 
to every combination of the rows. The whole row space C (AT) is orthogonal to N (A). 
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Here is a second proof of that orthogonality for readers who like matrix shorthand. 
The vectors in the row space are combinations A'y of the rows. Take the dot product 
of AT y with any x in the nullspace. These vectors are perpendicular: 


Nullspace and Row space x'(Aly) = (Ax)'y =0'y =0. (2) 


We like the first proof. You can see those rows of A multiplying x to produce zeros in 
equation (1). The second proof shows why A and A! are both in the Fundamental Theorem. 
AT goes with y and A goes with x. At the end we used Ax = 0. 


Example 3 The rows of A are perpendicular to x = (1, 1, —1) in the nullspace: 


1+3~-4=0 


1 
ax =|; 3 | I =(9| gives the dot products 542-7=0 


5 2 7]|_, 


Now we turn to the other two subspaces. In this example, the column space is all of R?. 
The nullspace of AT is only the zero vector (orthogonal to every vector). The columns of 
A and nullspace of AT are always orthogonal subspaces. 


slopes of Ai is pe f endicular nee to every. column of A 


‘Brey. vector. ye int 
The left nullspace. L 


Apply the original proof to At. Its nullspace is orthogonal to its row space—and the row 
space of AT is the column space of A. Q.E.D. 
For a visual proof, look at ATy = 0. Each column of A multiplies y to give 0: 


(column 1)? 0 
C(A) 1 N(A‘) Aly = e. y=}. (3) 
(column 7)” 0 


The dot product of y with every column of A is zero. Then y in the left nullspace is 
perpendicular to each column—and to the whole column space. 

Orthogonal Complements 
Important The fundamental subspaces are more than just orthogonal (in pairs). 
Their dimensions are also right. Two lines could be perpendicular in R?, but those lines 
could not be the row space and nullspace of a 3 by 3 matrix. The lines have dimensions 1 
and 1, adding to 2. The correct dimensions r and n — r must add ton = 3. 


The fundamental subspaces have dimensions 2 and 1, or 3 and 0. Those subspaces are 
not only orthogonal, they are orthogonal complements. 


DEFINITION The orthogonal complement of a subspace V contains every vector that is 
perpendicular to V. This orthogonal subspace is denoted by yt (pronounced “V perp”). 


By this definition, the nullspace is the orthogonal complement of the row space. 
Every x that is perpendicular to the rows satisfies Ax = 0. 
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dimension 
=r 


dimension 
=r 


AXrow =b 


AXnull =0 


nullspace 


nullspace of AT 


of A 


dimension 
=m-r 


dimension 
=n-r 


Figure 4.2: Two pairs of orthogonal subspaces. The dimensions add to n and add to m. 
This is an important picture—one pair of subspaces is in R” and one pair is in R”. 


The reverse is also true. If v is orthogonal to the nullspace, it must be in the row 
space. Otherwise we could add this v as an extra row of the matrix, without changing its 
nullspace. The row space would grow, which breaks the law r + (n — r) = n. We conclude 
that the nullspace complement N(A)* is exactly the row space C (AT). 

The left nullspace and column space are orthogonal in R”, and they are orthogonal 
complements. Their dimensions r and m — r add to the full dimension m. 


"Fundamental Theorem of Linear Algebra, Part 2 


Part 1 gave the dimensions of the subspaces. Part 2 gives the 90° angles between them. 
The point of “complements” is that every x can be split into a row space component xr 
and a nullspace component xn. When A multiplies x = x; + Xn, Figure 4.3 shows what 
happens: 


The nullspace component goes to zero: Ax, = 0. 
The row space component goes to the column space: Ax; = Ax. 


Every vector goes to the column space! Multiplying by A cannot do anything else. 
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dim r 


column 
b Space 


nullspace 
of AT 


dim m — r 


Figure 4.3: This update of Figure 4.2 shows the true action of A on x = xp + Xn. 
Row space vector x, to column space, nullspace vector x» to zero. 


More than that: Every vector b in the column space comes from one and only one vector 
in the row space. Proof: If Axy = Ax}, the difference xy — x} is in the nullspace. 
It is also in the row space, where x; and x} came from. This difference must be the zero 
vector, because the nullspace and row space are perpendicular. Therefore xy = x}. 

There is anr by r invertible matrix hiding inside A, if we throw away the two nullspaces. 
From the row space to the column space, A is invertible. The “pseudoinverse” will invert 
it in Section 7.3. 


Example 4 Every diagonal matrix has an r by r invertible submatrix: 


3 00 0 0 3 0 
A=|0 5 00 0 contains the submatrix l | . 
000 0 0 
The other eleven zeros are responsible for the nullspaces. The rank of B is also r = 2: 


12 3 4 5 1 3 
B=] 12 4 5 6 contains in the pivot rows and columns. 
2 4 5 6 14 


Every A becomes a diagonal matrix, when we choose the right bases for R” and R”. 
This Singular Value Decomposition has become extremely important in applications. 
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Combining Bases from Subspaces 


What follows are some valuable facts about bases. They were saved until now—when we 
are ready to use them. After a week you have a clearer sense of what a basis is (linearly 
independent vectors that span the space). Normally we have to check both of these proper- 
ties. When the count is right, one property implies the other: 


Starting with the correct number of vectors, one property of a basis produces the other. 
This is true in any vector space, but we care most about R”. When the vectors go into the 
columns of an n by n square matrix A, here are the same two facts: 


Uniqueness implies existence and existence implies uniqueness. Then A is invertible. If 
there are no free variables, the solution x is unique. There must be n pivots. Then back 
substitution solves Ax = b (the solution exists). 


Starting in the opposite direction, suppose Ax = b can be solved for every b 
(existence of solutions). Then elimination produced no zero rows. There are n pivots and 
no free variables. The nullspace contains only x = 0 (uniqueness of solutions). 


With bases for the row space and the nullspace, we have r + (n — r) = n vectors, 
This is the right number. Those n vectors are independent.’ Therefore they span R”. 


Each x is the sum x; + xn of a row space vector x, and a nullspace vector xy. 
4 


The splitting in Figure 4.3 shows the key point of orthogonal complements—the dimen- 
sions add to n and all vectors are fully accounted for. 


I 2 . 4 l. 2 2 
Example 5 fora =] 3 o mite =| 3 imoxr tan=| ael |. 


The vector (2, 4) is in the row space. The orthogonal vector (2, —1) is in the nullspace. 
The next section will compute this splitting for any A and x, by a projection. 


2If a combination of all n vectors gives xy + xn = 0, then xy = —xy is in both subspaces. 
So xr = xn = 0. All coefficients of the row space basis and nullspace basis must be zero—which 
proves independence of the n vectors together. 
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= REVIEW OF THE KEY IDEAS =m 


1. Subspaces V and W are orthogonal if every v in V is orthogonal to every w in W. 


2. V and W are “orthogonal complements” if W contains all vectors perpendicular to 
V (and vice versa). Inside R” , the dimensions of complements V and W add to n. 


3. The nullspace N(A) and the row space C (AT) are orthogonal complements, from 
Ax = 0. Similarly N (AT) and C (A) are orthogonal complements. 


4, Any n independent vectors in R” will span R”. 


5. Every x in R” has a nullspace component x, and a row space component x y. 


™ WORKED EXAMPLES #® 


4.1A Suppose S is a six-dimensional subspace of nine-dimensional space R°. 
(a) What are the possible dimensions of subspaces orthogonal to $? 
(b) What are the possible dimensions of the orthogonal complement S+ of S? 
(c) What is the smallest possible size of a matrix A that has row space $? 


(d) What is the shape of its nullspace matrix N? 


Solution 
(a) If S is six-dimensional in R®, subspaces orthogonal to S can have dimensions 0, 1, 2, 3. 
(b) The complement $ + is the largest orthogonal subspace, with dimension 3. 
(c) The smallest matrix A is 6 by 9 (its six rows are a basis for S). 
(d) Its nullspace matrix N is 9 by 3. The columns of N contain a basis for st, 


If a new row 7 of B is a combination of the six rows of A, then B has the same row 
space as A. It also has the same nullspace matrix N. The special solutions 51, 52,53 will 
be the same. Elimination will change row 7 of B to all zeros. 


4.1B The equation x — 3y — 4z = 0 describes a plane P in R? (actually a subspace). 


(a) The plane P is the nullspace N (A) of what 1 by 3 matrix A? 


(b) Find a basis 51,52 of special solutions of x — 3y — 4z = 0 (these would be the 
columns of the nullspace matrix N). 
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(c) Also find a basis for the line P+ that is perpendicular to P. 


(d) Split v = (6, 4, 5) into its nullspace component vy in P and its row space component 
vr in P~. 


Solution 
(a) The equation x — 3y — 4z = Q is Ax = 0 for the 1 by 3 matrix A = [1 —3 — 4]. 


(b) Columns 2 and 3 are free (the only pivot is 1). The special solutions with free vari- 
ables 1 and 0 are sı = (3, 1,0) and s2 = (4,0, 1) in the plane P = N(A). 


(c) The row space of A is the line P+ in the direction of the row z = (1, —3, —4). 


(d) To split v into vy, + vr = (C181 + C282) + €32, solve for cı = 1,c2 = 1,c3 = —1. 
6 3 4 1 1 Un = Sı +32 = (7,1,1) isin P = N(A) 
4 |=] 1 0 -3 1 vr = —83 = (-1,3,4) isin P+ = C(A). 
5 0 1 —4 —1 v = (6,4,5) equals (7,1,1) + (1,3,4) 


This method used a basis for each subspace combined into an overall basis $}, £2, Z. 
Section 4.2 will also project v onto a subspace S. There we will not need a basis for the 
perpendicular subspace S+. 


Problem Set 4.1 


Questions 1-12 grow out of Figures 4.2 and 4.3 with four subspaces. 


1 Construct any 2 by 3 matrix of rank one. Copy Figure 4.2 and put one vector in each 
subspace (two in the nullspace). Which vectors are orthogonal? 


2 Redraw Figure 4.3 for a 3 by 2 matrix of rank r = 2. Which subspace is Z (zero 
vector only)? The nullspace part of any vector x in R? is xn = 


3 Construct a matrix with the required property or say why that is impossible: 
(a) Column space contains | z| and [3]. nullspace contains [| 
(b) Row space contains |2] and |- | nullspace contains [1] 


(c) Ax = H has a solution and AT [e] = [e] 
(d) Every row is orthogonal to every column (A is not the zero matrix) 


(e) Columns add up to a column of zeros, rows add to a row of 1’s. 


4 If AB = 0 then the columns of B are in the of A. The rows of A are in the 
of B. Why can’t A and B be 3 by 3 matrices of rank 2? 
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5 


10 


11 


12 


(a) If Ax = b has a solution and Aly = 0, is (y'x = 0) or (y"b = 0)? 
(b) If A’ y = (1,1, 1) has a solution and Ax = 0, then . 


This system of equations Ax = b has no solution (they lead to 0 = 1): 


x+2y+2z = 5 
2x+2y+3z = 5 
3x +4y4+5z = 9 


Find numbers y1, y2, y3 to multiply the equations so they add to 0 = 1. You have 
found a vector y in which subspace? Its dot product yTb is 1, so no solution x. 


Every system with no solution is like the one in Problem 6. There are numbers 
yi,-+-» Ym that multiply the m equations so they add up to 0 = 1. This is called 
Fredholm’s Alternative: 


Exactly one of these problems has a solution 
Ax=b OR ATy=0 with y'b =1. 


If b is not in the column space of A, it is not orthogonal to the nullspace of A”. 
Multiply the equations xı — x2 = 1 and x2 — x3 = l and x, — x3 = 1 by numbers 
Y1, Y2, ¥3 chosen so that the equations add up to 0 = 1. 


In Figure 4.3, how do we know that Ax, is equal to Ax? How do we know that this 
vector is in the column space? If A = [11] and x = [4] what is x,? 


If ATAx = 0 then Ax = 0. Reason: Ax is in the nullspace of AT and also in the 
of A and those spaces are ___. Conclusion: ATA has the same nullspace 
as A. This key fact is repeated in the next section. 


Suppose A is a symmetric matrix (AT = A). 


(a) Why is its column space perpendicular to its nullspace? 


(b) If Ax = 0 and Az = 5z, which subspaces contain these “eigenvectors” x 
and z? Symmetric matrices have perpendicular eigenvectors x'z = 0. 


(Recommended) Draw Figure 4.2 to show each subspace correctly for 


1 2 1 0 
a=; l and B=(; ol: 


Find the pieces xy and xy and draw Figure 4.3 properly if 


A=1!10 0 and pi 
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Questions 13-23 are about orthogonal subspaces. 


13 Put bases for the subspaces V and W into the columns of matrices V and W. Explain 
why the test for orthogonal subspaces can be written V'W = zero matrix. This 
matches v' w = 0 for orthogonal vectors. 


14 The floor V and the wall W are not orthogonal subspaces, because they share a 
nonzero vector (along the line where they meet). No planes V and W in R? can be 
orthogonal! Find a vector in the column spaces of both matrices: 


1 2 5 4 
A=|1 3 and B=|6 3 
1 2 5 1 


This will be a vector Ax and also BX. Think 3 by 4 with the matrix [A B]. 


15 Extend Problem 14 to a p-dimensional subspace V and a g-dimensional subspace 
W of R”. What inequality on p + q guarantees that V intersects W in a nonzero 
vector? These subspaces cannot be orthogonal. 


16 Prove that every y in N (AT) is perpendicular to every Ax in the column space, using 
the matrix shorthand of equation (2). Start from ATy = 0. 


17 If S is the subspace of R? containing only the zero vector, what is St? If S is 
spanned by (1, 1, 1), what is S+2 IES is spanned by (1, 1, 1) and (1, 1, —1), what is 
a basis for S+? 


18 Suppose $ only contains two vectors (1,5, 1) and (2,2, 2) (not a subspace). Then 
S+ is the nullspace of the matrix A = . S} is a subspace even if S is not. 


19 Suppose L is a one-dimensional subspace (a line) in RÎ. Its orthogonal complement 
L+ is the perpendicular to L. Then (L~)~ is a perpendicular to L+. 
In fact (L+)+ is the same as 


20 Suppose V is the whole space R4. Then V+ contains only the vector . Then 
(V+)+ is . So (V+)+ is the same as 


21 Suppose S is spanned by the vectors (1,2,2,3) and (1,3,3,2). Find two vectors 
that span S$ L This is the same as solving Ax = 0 for which A? 


22 If P is the plane of vectors in R4 satisfying x, + x2 + x3 + x4 = 0, write a basis 
for P+. Construct a matrix that has P as its nullspace. 


23 Ifa subspace S is contained in a subspace Y, prove that St contains yt. 
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Questions 24-30 are about perpendicular columns and rows. 


24 


25 
26 


27 


28 


29 


30 


31 


32 


33 


Suppose an n by n matrix is invertible: AA~! = J. Then the first column of AT! is 
orthogonal to the space spanned by which rows of A? 


Find A’ A if the columns of A are unit vectors, all mutually perpendicular. 


Construct a 3 by 3 matrix A with no zero entries whose columns are mutually per- 
pendicular. Compute ATA. Why is it a diagonal matrix? 


The lines 3x + y = bı and 6x + 2y = bz are . They are the same line 
if . In that case (b1, b2) is perpendicular to the vector . The nullspace 
of the matrix is the line 3x + y = . One particular vector in that nullspace is 


Why is each of these statements false? 
(a) (1,1, 1) is perpendicular to (1, 1, —2) so the planes x + y +z = 0 and x + y — 
2z = Q are orthogonal subspaces. 


(b) The subspace spanned by (1, 1, 0, 0, 0) and (0, 0, 0, 1, 1) is the orthogonal com- 
plement of the subspace spanned by (1, —1, 0,0, 0) and (2, —2, 3, 4, —4). 


(c) Two subspaces that meet only in the zero vector are orthogonal. 


Find a matrix with v = (1, 2,3) in the row space and column space. Find another 
matrix with v in the nullspace and column space. Which pairs of subspaces can v 
not be in? 


Challenge Problems 


Suppose A is 3 by 4 and B is 4 by 5 and AB = 0. So N(A) contains C(B). 
Prove from the dimensions of N (A) and C (B) that rank(A) + rank(B) < 4. 


The command N = null(A) will produce a basis for the nullspace of A. Then the 
command B = null(N’) will produce a basis for the of A. 


Suppose I give you four nonzero vectors r, n, c, l in R?. 


(a) What are the conditions for those to be bases for the four fundamental sub- 
spaces C(A?), N(A), C(A), N(A*) of a 2 by 2 matrix? 


(b) What is one possible matrix A? 
Suppose I give you eight vectors 71, r2, R1, #2,€1,¢€2,1,12 in R4. 


(a) What are the conditions for those pairs to be bases for the four fundamental 
subspaces of a 4 by 4 matrix? 


(b) What is one possible matrix A? 
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4.2 Projections 


May we start this section with two questions? (In addition to that one.) The first ques- 
tion aims to show that projections are easy to visualize. The second question is about 
“projection matrices”—symmetric matrices with P? = P. The projection of b is Pb. 


1 What are the projections of b = (2, 3, 4) onto the z axis and the xy plane? 
2 What matrices produce those projections onto a line and a plane? 


When b is projected onto a line, its projection p is the part of b along that line. 
If b is projected onto a plane, p is the part in that plane. The projection p is Pb. 


The projection matrix P multiplies b to give p. This section finds p and P. 


The projection onto the z axis we call p,. The second projection drops straight down to 
the xy plane. The picture in your mind should be Figure 4.4. Start with 6 = (2,3, 4). 
One projection gives pı = (0, 0, 4) and the other gives p, = (2,3, 0). Those are the parts 
of b along the z axis and in the xy plane. 

The projection matrices P; and P} are 3 by 3. They multiply b with 3 components 
to produce p with 3 components. Projection onto a line comes from a rank one matrix. 
Projection onto a plane comes from a rank two matrix: 


00 0 / 


; 1 0 0 
Onto the z axis: Pı =|0 0 O0 Onto the xy plane: P,2=|0 1 0 
00 iI 000 


Pı picks out the z component of every vector. Pz picks out the x and y components. 
To find the projections p, and p, of b, multiply b by P, and Pz (small p for the vector, 
capital P for the matrix that produces it): 


0 0 0 x 0 1 0 0 xX x 
Pi, =Pib=)0 0 O]}y|[ =] 0 P2=Po2b=);0 1 O]} yl =jy 
00 i z z 0 0 0 z 0 


In this case the projections pı and ps are perpendicular. The xy plane and the z axis 
are orthogonal subspaces, like the floor of a room and the line between two walls. 


0 
Projection py = l 


Figure 4.4: The projections p} = P,b and p, = P2b onto the z axis and the xy plane. 
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More than that, the line and plane are orthogonal complements. Their dimensions add 
to 1+2 = 3. Every vector b in the whole space is the sum of its parts in the two subspaces. 
The projections p, and p, are exactly those parts: 


The vectors give py + po =b. The matrices give Pı + P2 = I. (1) 


This is perfect. Our goal is reached—for this example. We have the same goal for any line 
and any plane and any n-dimensional subspace. The object is to find the part p in each 
subspace, and the projection matrix P that produces that part p = Pb. Every subspace 
of R” has its own m by m projection matrix. To compute P, we absolutely need a good 
description of the subspace that it projects onto. 

The best description of a subspace is a basis. We put the basis vectors into the columns 
of A. Now we are projecting onto the column space of A! Certainly the z axis is the 
column space of the 3 by 1 matrix 41. The xy plane is the column space of Az. That plane 
is also the column space of A3 (a subspace has many bases): 


0 1 0 1 2 
A, =| 0 and A =]O 1 and A3=|2 3 
1 0 0 0 0 


Our problem is to project.any b onto the column space of any m by n matrix. 
Start with a line (dimension z = 1). The matrix A has only one column. Call it a. 


Projection Onto a Line 


A line goes through the origin in the direction of a = (a},...,@ ). Along that line, we 
want the point p closest to b = (b),...,5m). The key to projection is orthogonality: 
The line from b to p is perpendicular to the vector a. This is the dotted line marked 
e for error in Figure 4.5—-which we now compute by algebra. 


The projection p is some multiple of a. Call it p = Fa = “x hat” times a. Computing 
this number X will give the vector p. Then from the formula for p, we read off the projec- 
tion matrix P. These three steps will lead to all projection matrices: find X, then find the 
vector p, then find the matrix P. 


The dotted line b — p is e = b — Xa. It is perpendicular to a—this will determine F. 
Use the fact that b — p is perpendicular to a when their dot product is zero: 


a Projecting b onto a, error e = b — Ya 


ar (b-a) =0 or a-b—Xa-a=0 


The multiplication ab is the same as a+ b. Using the transpose is better, because it 
applies also to matrices. Our formula £ = a'b/a'a gives the projection p = Xa. 
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p= AF 
= A(ATA)! ATS 
= Pb 


he: vector p= Ya = = giba. 


; ‘Special case 1: Ifb =a then $ x= 1 `The projection ofa a onto a is s itself. Pa =a. E ; = 


` Special case 2: If b is perpendicular toa then aTb = 0. The projection is p = 0. : z : n 


1 1 
Example 1 Projecth=| 1 | ontoa= | 2 | to find p = Fa in Figure 4.5. 
1 2 


Solution The number F is the ratio of aTb = 5 to aTa = 9. So the projection is p = ža. 
The error vector between b and p is e = b — p. Those vectors p and e will add to 
b = (1,1,1): 


5a (51010) a pap p (4.11 
P~ 9° =\9° 9° 9 =O" P=\ gT] 


The error e should be perpendicular to a = (1,2, 2) and it is: eTa = $ — 2 — 2 = 0. 
Look at the right triangle of b, p, and e. The vector b is split into two parts—its 

component along the line is p, its perpendicular part is e. Those two sides of a right 

triangle have length ||B|| cos 8 and [jb || sin 0. Trigonometry matches the dot product: 


T 
p= <a has length ‘pl = ace 


aT 
The dot product is a lot simpler than getting involved with cos@ and the length of b. 
The example has square roots in cos@ = 5/3./3 and ||b|| = 3. There are no square 
roots in the projection p = 5a/9. The good way to 5/9 is b'a/a™a. 


lal = [ijcos® =) 


Now comes the projection matrix. In the formula for p, what matrix is multiplying b? 
You can see the matrix better if the number X is on the right side of a: 
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P is a column times a row! The column is a, the row is aT. Then divide by the number 
a'a. The projection matrix P is m by m, but its rank is one. We are projecting onto a 
one-dimensional subspace, the line through a. That is the column space of P. 


T 
aa 

Example 2 Find the projection matrix P = ala onto the line through a = [2]. 
a 


Solution Multiply column a times row a" and divide by aTa = 9: 


aa?” 1|! 1|} 2 2 
Projection matrix P = -~> = 5 2}[1 2 2]= 5 24 4 
aa 2 244 
This matrix projects any vector b onto a. Check p = Pb for b = (1,1, 1) in Example 1: 


1 12 2 1 1 5 
p=Pb=-|2 4 4 1|=-| 10 which is correct. 
9)o 4 44]i 10 


If the vector a is doubled, the matrix P stays the same. It still projects onto the same line. 
If the matrix is squared, P? equals P. Projecting a second time doesn’t change anything, 
so P? = P. The diagonal entries of P add up to EL +4+4)=1. 


The matrix Z — P should be a projection too. It produces the other side e of the 
triangle—the perpendicular part of b. Note that (J — P)b equals b — p which is e in the 
left nullspace. When P projects onto one subspace, I — P projects onto the perpendicular 
subspace. Here I — P projects onto the plane perpendicular to a. 


Now we move beyond projection onto a line. Projecting onto an n-dimensional 
subspace of R” takes more effort. The crucial formulas will be collected in equations 
(5)-(6)+-(7). Basically you need to remember those three equations. 


Projection Onto a Subspace 
Start with n vectors a1,...,@, in R”. Assume that these a’s are linearly independent. 


Problem: Find the combination p = Zia; + --: + Xn@p closest to a given vector b. 
We are projecting each b in R” onto the subspace spanned by the a’s, to get p. 


With n = 1 (only one vector a1) this is projection onto a line. The line is the column space 
of A, which has just one column. In general the matrix A has n columns @),...,@n. 

The combinations in R” are the vectors Ax in the column space. We are looking for 
the particular combination p = AX (the projection) that is closest to b. The hat over £ 
indicates the best choice X, to give the closest vector in the column space. That choice is 
a'b/a'a whenn = 1. Forn > 1, the best X is to be found now. 


We compute projections onto n-dimensional subspaces in three steps as before: 
Find the vector X, find the projection p = AX, find the matrix P. 

The key is in the geometry! The dotted line in Figure 4.5 goes from 6 to the near- 
est point AX in the subspace. This error vector b — AX is perpendicular to the subspace. 
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The error b — AF makes a right angle with all the vectors a;,...,@,. The n right angles 
give the n equations for X: 
a\(b — Ax) =0 — a] — 
: or : b-—Ax |=] 0]. (4) 
al(b — AX) =0 —a};— 


The matrix with those rows a! is AT. The n equations are exactly AT(b — AX) = 0. 
Rewrite AT(b — AF) = 0 in its famous form ATAF = AFb. This is the equation for F, 
and the coefficient matrix is ATA. Now we can find ¥ and p and P, in that order: 


Those formulas are identical with (5) and (6) and (7). The number aTa becomes the 
matrix ATA. When it is,a number, we divide by it. When it is a matrix, we invert it. 
The new formulas contain (A™A)~! instead of 1/a™a. The linear independence of the 
columns @1,...,@, will guarantee that this inverse matrix exists. 

The key step was AT(b — AX) = 0. We used geometry (e is perpendicular to all the 
a’s). Linear algebra gives this “normal equation” too, in a very quick way: 


1. Our subspace is the column space of A. 
2. The error vector b — AF is perpendicular to that column space. 
3. Therefore b — AX is in the nullspace of AT. This means A'(b — AF) = 0. 


The left nullspace is important in projections. That nullspace of AT contains the error vector 
e = b — Ax. The vector b is being split into the projection p and the error e = b — p. 
Projection produces a right triangle (Figure 4.5) with sides p, e, and b. 
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Example 3 If A = [i 1| and b =(3] find ¥ and p and P. 


Solution Compute the square matrix ATA and also the vector ATb: 
1 0 6 
1 i 1 3 3 1 1 1 6 
Ty _ Tp — _ 
AA=| 4 1 | 3 =|; ;| and AT =| j 1 | K =[6]; 


Now solve the normal equation A'AX = ATb to find Ẹ: 


GJEJE m= 2-[2]-[3] 


The combination p = AX is the projection of b onto the column space of A: 


1 0 5 l 
p=5]|1]—3|1]=] 2]. Theerroris e =b-—p=]|-—2]. (9) 
1 2 —l l 


Two checks on the calculation. First, the error e = (1, —2, 1) is perpendicular to both 
columns (1,1, 1) and (0, 1,2). Second, the final P times b = (6,0,0) correctly gives 
p = (5,2,—1). That solves the problem for one particular b. 

To find p = Pb for every b, compute P = A(A™A)~! AT. The determinant of ATA is 
15 — 9 = 6; then (ATA)! is easy. Multiply A times (A™A)~! times AT to reach P: 


5 2-1 

1 — 1 

(ATA! =- > 3) amd P=4|2 2 2l. (10) 
653 3 S|- 2 5 


We must have P? = P, because a second projection doesn’t change the first projection. 


Warning The matrix P = A(ATA) IAT is deceptive. You might try to split (ATA)! 
into A~! times (A')—!. If you make that mistake, and substitute it into P, you will find 
P = AA} (AT)! AT. Apparently everything cancels. This looks like P = 7, the identity 
matrix. We want to say why this is wrong. 

The matrix A is rectangular. It has no inverse matrix. We cannot split (ATA)! into 
A`! times (AT)! because there is no AT! in the first place. 

In our experience, a problem that involves a rectangular matrix almost always leads to 
ATA. When A has independent columns, ATA is invertible. This fact is so crucial that we 
state it clearly and give a proof. 


Proof ATA is a square matrix (n by n). For every matrix A, we will now show that 
A'A has the same nullspace as A. When the columns of A are linearly independent, its 
nullspace contains only the zero vector. Then ATA, with this same nullspace, is invertible. 
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Let A be any matrix. If x is in its nullspace, then Ax = 0. Multiplying by AT gives 
A' Ax = 0. So x is also in the nullspace of ATA. 

Now start with the nullspace of ATA. From ATAx = 0 we must prove Ax = 0. We 
can’t multiply by (A')—!, which generally doesn’t exist. Just multiply by xT: 


(x')ATAx =0 or (Ax)'(Ax) =0 or |/Ax||? =0. 


This says: If ATAx = 0 then Ax has length zero. Therefore Ax = 0. Every vector x in 
one nullspace is in the other nullspace. If ATA has dependent columns, so has A. If ATA 
has independent columns, so has A. This is the good case: 


When A has independent columns, A" A is square, symmetric, and invertible. 


To repeat for emphasis: ATA is (n by m) times (m by n). Then ATA is square (n by n). 
It is symmetric, because its transpose is (ATA) = A™(A")' which equals ATA. We just 
proved that ATA is invertible—provided A has independent columns. Watch the difference 
between dependent and independent columns: 


AT A ATA AT A ATA 
E l 4 ; =Í; s] E 1 i| i > =|; s] 
22 0 0 0 4 8 2 2 1 01 4 9 

dependent singular indep. invertible 


Very brief summary To find the projection p = Xa, +--+ nan, solve ATAF = A™D. 
This gives x. The projection is AF and the error ise = b — p = b — AX. The projection 
matrix P = A(A™A)~!A? gives p = Pb. 

This matrix satisfies P? = P . The distance from b to the subspace is |je ||. 


= ‘REVIEW OF THE KEY IDEAS = 


1. The projection of b onto the line through a is p = ax = a(a7b/a"a). 

2. The rank one projection matrix P = aaT/aa multiplies b to produce p. 

3. Projecting b onto a subspace leaves e = b — p perpendicular to the subspace. 
4. When A has full rank n, the equation ATAF = ATb leads to f and p = AF. 


5. The projection matrix P = A(ATA) !AT has PT = P and P? = P. 
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= WORKED EXAMPLES =" 


4.2 A Project the vector b = (3,4,4) onto the line through a = (2,2,1) and then 
onto the plane that also contains a* = (1,0,0). Check that the first error vector b — p 
is perpendicular to a, and the second error vector e* = b — p* is also perpendicular to a*. 

Find the 3 by 3 projection matrix P onto that plane of a and a*. Find a vector whose 
projection onto the plane is the zero vector. 


Solution The projection of b = (3, 4, 4) onto the line through a = (2,2, 1) is p = 2a: 


a'b 18 


i = ——a = — (2,2,1) = (4,4,2). 
Onto a line p ata” 9 (2,2,1) = ( 2) 


The error vector e = b — p = (—1,0, 2) is perpendicular to a. So p is correct. 
The plane of a = (2,2, 1) and a* = (1,0,0) is the column space of A = [a a*]: 


2 1 1 0 0 
A=|2 0 aa=|5 | aaisa 5] P=|0 8 4 
1 0 5 0 4 2 


Then p* = Pb = (,4.8,2.4). The error e* = b — p* = (0, —.8, 1.6) is perpendicular 
to a and a*. This e* is in the nullspace of P and its projection is zero! Note P? = P. 


4.2 B Suppose your pulse is measured at x = 70 beats per minute, then at x = 80, 
then at x = 120. Those three equations Ax = b in one unknown have AT = [1 1 1] and 
b = (70, 80, 120). The best £ is the of 70,80, 120. Use calculus and projection: 


1. Minimize E = (x — 70)? + (x — 80)? + (x — 120)? by solving dE/dx = 0. 
2. Project b = (70, 80, 120) onto a = (1,1, 1) to find £ = a™h/a'a. 


Solution The closest horizontal line to the heights 70, 80, 120 is the average X = 90: 


dE 70 + 80 + 120 
ax = 2(x — 70) + 2(x — 80) + 2(x — 120) =0 gives X= _ 
a'b — (1,1, 1)"(70, 80,120) | 70+80+120 | 


2P = Nett) UE 8 90. 
ata (,1,D°0,1,) 3 


Projection : x= 
4.2 C In recursive least squares, a fourth measurement 130 changes Foig to Xnew- 
Compute new and verify the update formula Znew = Fold + $(130 — Fold). 


Going from 999 to 1000 measurements, new = Fold + 7300 (61000 —Xold) would only 
need Xpjq and the latest value b1000. We don’t have to average all 1000 numbers! 
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Solution The new measurement b4 = 130 adds a fourth equation and X is updated to 100. 
You can average b;, b2, b3, b4 or combine the average of b1, b2, b3 with b4: 


70 + 80 + 120 + 130 


7 = 100 isalso Xoja + 74 —Xoiq) = 90+ qo). 


The update from 999 to 1000 measurements shows the ‘ ‘gain matrix” To in a Kalman 
filter multiplying the prediction error bnew — Xold. Notice == l 


7500 = 999 — 553000: 
< — bitit booo _ bi ti + boos 1 f _ bi ++: + bos 
new 1000 999 1000 | 700° 999 l 

Problem Set 4.2 
Questions 1—9 ask for projections onto lines. Also errors e = b — p and matrices P. 
1 Project the vector b onto the line through a. Check that e is perpendicular to a: 

l 1 1 —1 

(a) b=} 2 and a=] 1 (b) b=13 and a= | -—3 
2 |- l 1 —1 


2 Draw the projection of b onto a and also compute it from p = Fa: 


w b= [5 and «=(4| (b) s=] and e=| l 


3 In Problem 1, find the projection matrix P = aa'/a‘a onto the line through each 
vector a. Verify in both cases that P? = P. Multiply Pb in each case to compute 
the projection p. 


4 Construct the projection matrices P, and P2 onto the lines through the a’s in Prob- 
lem 2. Is it true that (Pı + Py)? = Pi + Po? This would be true if Pı Pa = 0. 


5 Compute the projection matrices aa'/a™a onto the lines through a} = (—1, 2, 2) and 
@2 = (2,2,—1). Multiply those projection matrices and explain why their product 
Pı Pa is what it is. 


6 Project b = (1,0,0) onto the lines through a; and az in Problem 5 and also onto 
a3 = (2,—1,2). Add up the three projections pı + Pa + P3. 


7 Continuing Problems 5—6, find the projection matrix P3 onto a3 = (2, —1, 2). Verify 
that P1 + P2 + P3 = I. The basis 41, a2, a3 is orthogonal! 


8 Project the vector b = (1, 1) onto the lines through a, = (1,0) and az = (1,2). 
Draw the projections pı and p> and add p, + p2. The projections do not add to b 
because the a’s are not orthogonal. 
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Questions 5-6-7 Questions 8—9-10 


9 In Problem 8, the projection of b onto the plane of a, and a2 will equal b. Find 
P = A(ATA)!A™ for A =| aı a2] =[}}]. 


10 Project a; = (1,0) onto az = (1,2). Then project the result back onto a;. Draw 
these projections and multiply the projection matrices Pı P2: Is this a projection? 


Questions 11-20 ask for projections, and projection matrices, onto subspaces. 


11 Project b onto the column space of A by solving ATAF = ATb and p = AX: 


1 1 2 1 1 4 
(a) A=}0O 1j and b=]3 b) A=]1 1] and b=] 4 
0 0 4 0 1 6 


Find e = b — p. It should be perpendicular to the columns of A. 


12 Compute the projection matrices P, and P2 onto the column spaces in Problem 11. 
Verify that P1b gives the first projection p,. Also verify P? = P2. 


13 (Quick and Recommended) Suppose A is the 4 by 4 identity matrix with its last 
column removed. A is 4 by 3. Project b = (1, 2,3, 4) onto the column space of A. 
What shape is the projection matrix P and what is P? 


14 Suppose b equals 2 times the first column of A. What is the projection of b onto 
the column space of A? Is P = I for sure in this case? Compute p and P when 
b = (0,2, 4) and the columns of A are (0, 1, 2) and (1, 2, 0). 


15 If Ais doubled, then P = 2A(4A™A)—!24". This is the same as A(A™A)~! AT. The 
column space of 2A is the same as . Is ¥ the same for A and 2A? 


16 What linear combination of (1,2, —1) and (1,0, 1) is closest to b = (2,1, 1)? 


17 (Important) If P? = P show that (I — P)? = I — P. When P projects onto the 
column space of A, J — P projects onto the 
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(a) If P is the 2 by 2 projection matrix onto the line through (1, 1), then J — P is 
the projection matrix onto 


(b) If P is the 3 by 3 projection matrix onto the line through (1, 1, 1), then J — P 
is the projection matrix onto 


To find the projection matrix onto the plane x — y — 2z = 0, choose two vectors in 
that plane and make them the columns of A. The plane should be the column space. 
Then compute P = A(A™A)~!Al, 


To find the projection matrix P onto the same plane x — y — 2z = 0, write down a 
vector e that is perpendicular to that plane. Compute the projection Q = ee'/eTe 
and then P = [ — Q. 


Questions 21-26 show that projection matrices satisfy P? = P and PT = P. 


21 


22 


23 


24 


25 


26 
27 


28 


29 


Multiply the matrix P = A(A™A)—!A! by itself. Cancel to prove that P? = P. 
Explain why P(Pd) always equals Pb: The vector Pb is in the column space so its 
projection is 


Prove that P = A(A™A)~!A? is symmetric by computing PT. Remember that the 
inverse of a symmetric matrix is symmetric. 


If A is square and invertible, the warning against splitting (A7.A)~! does not apply. 
It is true that AA7!(A1)~! AT = J. When A is invertible, why is P = 1? What is 
the error e? 


The nullspace of AT is to the column space C(A). So if ATb = 0, the 
projection of b onto C(A) should be p = . Check that P = A(A™A)7!A™ 
gives this answer. 

The projection matrix P onto an n-dimensional subspace has rank r = n. 
Reason: The projections Pb fill the subspace $. So S is the of P., 


If an m by m matrix has A? = A and its rank is m, prove that A = 7. 


The important fact that ends the section is this: If ATAx = 0 then Ax = 0. 
New Proof: The vector Ax is in the nullspace of . Ax is always in the column 
space of . To be in both of those perpendicular spaces, Ax must be zero. 


Use PT = P and P? = P to prove that the length squared of column 2 always 
equals the diagonal entry P22. This number is 2 = 4 + 4 + £ for 


P=e 22 2 
-l1 2 5 


If B has rank m (full row rank, independent rows) show that BB? is invertible. 
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Challenge Problems 


(a) Find the projection matrix Pc onto the column space of A (after looking closely 


at the matrix!) 
3 6 6 
A= | 4 8 8 | 


(b) Find the 3 by 3 projection matrix Pr onto the row space of A. Multiply B = 
Pc APpr. Your answer B should be a little surprising—can you explain it? 


In R”, suppose I give you b and p, and p is a combination of a;,...,@,. How 
would you test to see if p is the projection of b onto the subspace spanned by the 
a’s? 


Suppose P; is the projection matrix onto the 1-dimensional subspace spanned by 
the first column of A. Suppose P2 is the projection matrix onto the 2-dimensional 
column space of A. After thinking a little, compute the product P2 P4. 


1 0 
A=|2 1 
0 1 


Pı and P2 are projections onto subspaces $ and T. What is the requirement on 
those subspaces to have Pı P2 = P2 P1? 

If A has r independent columns and B has r independent rows, A B is invertible. 
Proof: When A is m by r with independent columns, we know that ATA is invertible. 
If B is r by n with independent rows, show that BBT is invertible. (Take A = B7.) 


Now show that AB has rank r. Hint: Why does ATA BBT have rank r? That matrix 
multiplication by AT and BT cannot increase the rank of AB, by Problem 3.6:26. 
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4.3 Least Squares Approximations 


It often happens that Ax = 6 has no solution. The usual reason is: too many equations. 
The matrix has more rows than columns. There are more equations than unknowns 
(m is greater than n). The n columns span a small part of m-dimensional space. Unless all 
measurements are perfect, b is outside that column space. Elimination reaches an 
impossible equation and stops. But we can’t stop just because measurements include noise. 


To repeat: We cannot always get the error e = b — Ax down to zero. When e is zero, 
x is an exact solution to Ax = b. When the length of e is as small as possible, £ is a 
least squares solution. Our goal in this section is to compute ¥ and use it. These are real 
problems and they need an answer. 


The previous section emphasized p (the projection). This section emphasizes ¥ (the 
least squares solution). They are connected by p = AX. The fundamental equation is still 
ATAF = A'b. Here is a short unofficial way to reach this equation: 


Example 1 A crucial application of least squares is fitting a straight line to m points. 
Start with three points: Find the closest line to the points (0, 6), (1, 0), and (2,0). 


No straight line b = C + Dt goes through those three points. We are asking for two 


numbers C and D that satisfy three equations. Here are the equations at £ = 0,1,2 to 
match the given values b = 6,0, 0: 


t=0 = The first point is on the line b = C + Dt if 
t=1 The second point is on the line b = C + Dt if 
t=2 The third point is on the line b = C + Dt if 


This 3 by 2 system has no solution: b = (6,0,0) is not a combination of the columns 
(1, 1, 1) and (0, 1,2). Read off A, x, and b from those equations: 


1 0 c 6 
A=|1 1 x= | | b=1|0 Ax = b is not solvable. 
1 2 0 


The same numbers were in Example 3 in the last section. We computed ¥ = (5,—3). 
Those numbers are the best C and D, so 5 — 3¢ will be the best line for the 3 points. 
We must connect projections to least squares, by explaining why ATAF = ATb. 

In practical problems, there could easily be m = 100 points instead of m = 3. They 
don’t exactly match any straight line C + Dt. Our numbers 6, 0, 0 exaggerate the error so 
you can see £1, €2, and e3 in Figure 4.6. 


Minimizing the Error 


How do we make the error e = b — Ax as small as possible? This is an important question 
with a beautiful answer. The best x (called ¥) can be found by geometry or algebra or 
calculus: 90° angle or project using P or set the derivative of the error to zero. 
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By geometry Every Ax lies in the plane of the columns (1,1, 1) and (0,1,2). In that 
plane, we look for the point closest to b. The nearest point is the projection p. 


The best choice for AF is p. The smallest possible error is e = b — p. The three points at 
heights (pı, p2, p3) do lie on a line, because p is in the column space. In fitting a straight 
line, ¥ gives the best choice for (C, D). 


By algebra Every vector b splits into two parts. The part in the column space is p. 
The perpendicular part in the nullspace of AT is e. There is an equation we cannot solve 
(Ax = b). There is an equation AF = p we do solve (by removing e): 


Ax =b = p+e is impossible; Ax = p is solvable. (1) 
The solution to Ax = p leaves the least possible error (which is e): 
Squared length for any x | Ax — b||? = || Ax — pl? + llell?. (2) 


This is the law c? = a? + b? for aright triangle. The vector Ax — p in the column space is 
perpendicular to e in the left nullspace. We reduce Ax — p to zero by choosing x to be T. 
That leaves the smallest possible error e = (e1, €2, €3). 

Notice what “smallest” means. The squared length of Ax — b is minimized: 


The least squares solution ¥ makes E = || Ax — b||? as small as possible. 


column space 


errors = vertical distances to line e = (1,—2,1) 


Figure 4.6: Best line and projection: Two pictures, same problem. The line has heights 
p = (5,2,—1) with errors e = (1,—2, 1). The equations ATAF = A"b give X = (5, —3). 
The best line is b = 5 — 3t and the projection is p = 5a, — 342. 


Figure 4.6a shows the closest line. It misses by distances e1,e2,e3 = 1,—2,1. 
Those are vertical distances. The least squares line minimizes E = e? + e? + e2. 
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Figure 4.6b shows the same problem in 3-dimensional space (b p e space). The vector 
b is not in the column space of A. That is why we could not solve Ax = b. No line goes 
through the three points. The smallest possible error is the perpendicular vector e. This is 
e = b — AX, the vector of errors (1, —2, 1) in the three equations. Those are the distances 
from the best line. Behind both figures is the fundamental equation A’ AX = ATb. 

Notice that the errors 1, —2, 1 add to zero, The error e = (e1, €2, e3) is perpendicular 
to the first column (1, 1, 1) in A. The dot product gives e; + e2 + e3 = 0. 


By calculus Most functions are minimized by calculus! The graph bottoms out and the 
derivative in every direction is zero. Here the error function E to be minimized is a sum of 
squares e? + e? + e? (the square of the error in each equation): 


E = ||Ax — b||? =(C+D-0-6)7+(C4+D-1)?+(C+D-2). (3) 


The unknowns are C and D. With two unknowns there are two derivatives—both zero 
at the minimum. They are “partial derivatives” because £ /0C treats D as constant and 
dE /dD treats C as constant: 


9E/IC =2(C +D-0-6) +2(C+D-1l +21C+D-2) =0 
dE/8D = 2(C + D-0—6)(0) +2(C + D-1)0) +2(C + D-2)(2) =0. 


dE /dD contains the extra factors 0,1,2 from the chain rule. (The last derivative from 
(C + 2D)? was 2 times C + 2D times that extra 2.) In the C derivative the corresponding 
factors are 1,1, 1, because C is always multiplied by 1. It is no accident that 1, 1, 1 and 
0, 1, 2 are the columns of A. 

Now cancel 2 from every term and collect all C’s and all D’s: 


The C derivative is zero: 3C +3D=6 


This matrix 3 3 
The D derivative is zero: 3C +5D=0 


+ AT 
3 5 | is A'A (4) 
These equations are identical with ATAF = A‘b. The best C and D are the components 
of ¥. The equations from calculus are the same as the “normal equations” from linear 
algebra. These are the key equations of least squares: 


‘The partial derivatives of (Ax — b] are zero when ATAF = AT. 


The solution is C = 5 and D = —3. Therefore b = 5 — 3t is the best line—it comes 
closest to the three points. Att = 0, 1, 2 this line goes through p = 5, 2, —1. 
It could not go through b = 6, 0, 0. The errors are 1, —2, 1. This is the vector e! 


The Big Picture 


The key figure of this book shows the four subspaces and the true action of a matrix. The 
vector x on the left side of Figure 4.3 went to b = Ax on the right side. In that figure x 
was split into x; + x,. There were many solutions to Ax = b. 
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column space 
inside R” 


solvable 
p is in the column space 


row space 
is R” 


not solvable 
b not in the column space 


0 


Independent columns 
Nullspace = {0} 


Figure 4.7: The projection p = AX is closest to b, so X minimizes E = ||b — Ax ||?. 


In this section the situation is just the opposite. There are no solutions to Ax = b. 
Instead of splitting up x we are splitting up b. Figure 4.3 shows the big picture for least 
squares. Instead of Ax = b we solve AX = p. The error e = b — p is unavoidable. 

Notice how the nullspace N(A) is very small—just one point. With independent 
columns, the only solution to Ax = 0 is x = 0. Then ATA is invertible. The equation 
ATAF = A'b fully determines the best vector ¥. The error has Ate = 0. 

Chapter 7 will have the complete picture—all four subspaces included. Every x splits 
into xy + Xn, and every b splits into p + e. The best solution is ¥; in the row space. We 
can’t help e and we don’t want x,—this leaves AF = p. 


Fitting a Straight Line 


Fitting a line is the clearest application of least squares. It starts with m > 2 points, 
hopefully near a straight line. At times t),...,t, those m points are at heights 
by,...,5m. The best line C + Dt misses the points by vertical distances e1,..., €m. 
No line is perfect, and the least squares line minimizes E = e? +---+ e2. 

The first example in this section had three points in Figure 4.6. Now we allow m points 
(and m can be large). The two components of ¥ are still C and D. 

A line goes through the m points when we exactly solve Ax = b. Generally we can’t 
do it. Two unknowns C and D determine a line, so A has only n = 2 columns. To fit the 
m points, we are trying to solve m equations (and we only want two!): 


C+ Dy = bi l t 


C + Dh = b, l h 


Ax =b is with A=]. . |. (5) 
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The column space is so thin that almost certainly b is outside of it. When b happens to lie 
in the column space, the points happen to lie on a line. In that case b = p. Then Ax = b 
is solvable and the errors are e = (0,..., 0). 


The closest line C + Dt has heights py,... 5 Pm with eTTOTS €1,.++5€m:- 
Solve ATAF = ATb for £ = (C, D). The errors are e; = b; — C — Dti. 


Fitting points by a straight line is so important that we give the two equations ATAF = 
ATb, once and for all. The two columns of A are independent (unless all times f; are the 
same). So we turn to least squares and solve ATAF = ATD. 


1 ft Si 

. T wee . , _ m i 

Dot-product matrix A'A = | how | ` = 5, zal . (6) 
m 


On the right side of the normal equation is the 2 by 1 vector ATb: 


by 
no po 1], | fe 


In a specific problem, these numbers are given. The best X = (C, D) is in equation (9). 


eae z 


2+ zelle] 


The vertical errors at the m points on the line are the components of e = b — p. This 
error vector (the residual) b — AX is perpendicular to the columns of A (geometry). The 
error is in the nullspace df AT (linear algebra). The best £ = (C, D) minimizes the total 
error E, the sum of squares: 


E(x) = ||Ax — b||? = (C + Dt; — b1)? +--+ (C + Dtm — bm)’. 


When calculus sets the derivatives JE /0C and dE /dD to zero, it produces ATAF = ATb. 

Other least squares problems have more than two unknowns. Fitting by the best parabola 
has n = 3 coefficients C, D, E (see below). In general we are fitting m data points 
by n parameters x1,..., Xn. The matrix A has n columns and n < m. The derivatives 
of || Ax — b||? give the n equations ATAF = ATb. The derivative of a square is linear. 
This is why the method of least squares is so popular. 


Example 2 A has orthogonal columns when the measurement times ¢; add to zero. 
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Suppose b = 1,2,4 at times t = —2,0,2. Those times add to zero. The columns of A 
have zero dot product: 

C + D(-2) = 1 1 -2 C 1 

C+ D(0)=2 or Ax=j}1 0 5 |- 2 

C+ D(2)=4 1 2 4 


Look at the zeros in ATA: 


T ys T . 3 O}/C] {7 
A’ Ax = A‘b is lo slip =I 61: 


Main point: Now ATA is diagonal. We can solve separately for C = z and D = 5, The 
zeros in ATA are dot products of perpendicular columns in A. The diagonal matrix ATA, 


with entries m = 3 and t? + t2 + t? = 8, is virtually as good as the identity matrix. 


Orthogonal columns are so helpful that it is worth moving the time origin to produce 
them. To do that, subtract away the average time T = (f1 +----+tm)/m. The shifted times 
T; = t —î add to YT = mi — mt = 0. With the columns now orthogonal, ATA is 
diagonal. Its entries are m and T? + --- + 72. The best C and D have direct formulas: 


O bi+etbm pba te + bm Tn 


Tist-f C = 5 
m To ++ T3 


(9) 
The best line is C + DT or C + D(t —T). The time shift that makes ATA diagonal is an 
example of the Gram-Schmidt process: orthogonalize the columns in advance. 


Fitting by a Parabola 


If we throw a ball, it would be crazy to fit the path by a straight line. A parabola b = 
C + Dt + Et? allows the ball to go up and come down again (b is the height at time f). 
The actual path is not a perfect parabola, but the whole theory of projectiles starts with that 
approximation. 

When Galileo dropped a stone from the Leaning Tower of Pisa, it accelerated. 
The distance contains a quadratic term igt’. (Galileo’s point was that the stone’s mass 
is not involved.) Without that,t? term we could never send a satellite into the right or- 
bit. But even with a nonlinear function like +2, the unknowns C, D, E appear linearly! 
Choosing the best parabola is still a problem in linear algebra. 


Problem Fit heights bi,..., bm at times t1, . . . , ím by a parabola C + Dt + Et’. 


Solution With m > 3 points, the m equations for an exact fit are generally unsolvable: 


C + Dti + Et? =b; ln R 
: has the m by 3 matrix A=ļ|: : : |. (10) 

2 

C + Dim + Et}, = bm l im tm 


Least squares The closest parabola C + Dt + Et? chooses ¥ = (C,D, E) to 
satisfy the three normal equations ATAF = ATb. 
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May I ask you to convert this to a problem of projection? The column space of A has 
dimension . The projection of b is p = AX, which combines the three columns 
using the coefficients C, D, E. The error at the first data point is e} = bı — C — Dt, — Er?. 
The total squared error is e? + . If you prefer to minimize by calculus, take the 
partial derivatives of Æ with respect to , , . These three derivatives will 
be zero when ¥ = (C, D, E) solves the 3 by 3 system of equations 

Section 8.5 has more least squares applications. The big one is Fourier series— 
approximating functions instead of vectors. The function to be minimized changes from a 
sum of squared errors ef + --- + e2 to an integral of the squared error. 


Example3 Fora parabola b = C + Dt + Et? to go through the three heights b = 6,0,0 
when ¢ = 0, 1, 2, the equations are 


C+D-0+E-0? =6 
C+D-14+E-17=0 (11) 
C+D-24+E-27=0. 


This is Ax = b. We can solve it exactly. Three data points give three equations and a 
square matrix. The solution is x = (C, D, E) = (6,—9,3). The parabola through the 
three points in Figure 4.8a is b = 6 — 9t + 3¢?. 

What does this mean for projection? The matrix has three columns, which span the 
whole space R*. The projection matrix is the identity. The projection of b is b. The error 
is zero. We didn’t need ATAF = Ab, because we solved Ax = b. Of course we could 
multiply by AT, but there is no reason to do it. 

Figure 4.8 also shows a fourth point b4 at time t4. If that falls on the parabola, the new 
Ax = b (four equations) is still solvable. When the fourth point is not on the parabola, we 
turn to ATAF = ATb. Will the least squares parabola stay the same, with all the error at 
the fourth point? Not likely! 

The smallest error vector (e1, €2, €3, €4) is perpendicular to (1, 1, 1, 1), the first column 
of A. Least squares balances out the four errors, and they add to zero. 
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Figure 4.8: From Example 3: An exact fit of the parabola at £ = 0, 1,2 means that p = b 
and e = 0. The point b4 off the parabola makes m > n and we need least squares. 
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= REVIEW OF THE KEY IDEAS ® 


1. The least squares solution ¥ minimizes E = || Ax — b ||2. This is the sum of squares 
of the errors in the m equations (m > n). 


2. The best X comes from the normal equations ATAF = ATA. 
3. To fit m points by a line b = C + Dt, the normal equations give C and D. 


4. The heights of the best line are p = (p1,..., Pm). The vertical distances to the data 
points are the errors e = (€1,...,€m). 


5. If we try to fit m points by a combination of n < m functions, the m equations 
Ax = b are generally unsolvable. The n equations ATAF = ATb give the least 
squares solution—the combination with smallest MSE (mean square error). 


= WORKED EXAMPLES =" 


4.3 A Start with nine measurements b, to bo, all zero, at times £ = 1,...,9. The 
tenth measurement bjo = 40 is an outlier. Find the best horizontal line y = C to fit 
the ten points (1,0), (2,0),..., (9,0), (10,40) using three measures for the error E: 
(1) Least squares Ep = e? + -+- + ¢?, (then the normal equation for C is linear) 


(2) Least maximum error Ey, = |emax| (3) Least sum of errors E4 = |e;| +--- + eyo]. 
Solution (1) The least squares fit to 0,0,...,0,40 by a horizontal line is C = 4: 


A=columnofl’s A'A=10 Ab =sumofb; = 40. So10C = 40. 
(2) The least maximum error requires C = 20, halfway between 0 and 40. 


(3) The least sum requires C = 0 (!!). The sum of errors 9|C | + |40 — C | would increase 
if C moves up from zero. 


The least sum comes from the median measurement (the median of 0, ... , 0, 40 is zero). 
Many statisticians feel that the least squares solution is too heavily influenced by outliers 
like b1ọ = 40, and they prefer least sum. But the equations become nonlinear. 

Now find the least squares straight line C + Dt through those ten points. 


safe, Eae 8] [EA] [ah] 


Those come from equation (8). Then ATAF = ATb gives C = —8 and D = 24/11. 


What happens to C and D if you multiply the b; by 3 and then add 30 to get 
bnew = (30, 30,..., 150)? Linearity allows us to rescale b = (0,0,..., 40). Multiplying 
b by 3 will multiply C and D by 3. Adding 30 to all b; will add 30 to C. 
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4.3B Find the parabola C + Dt + Et? that comes closest (least squares error) to the val- 
ues $ = (0,0, 1,0,0) at the times £ = —2,—1,0, 1,2. First write down the five equations 
Ax = b in three unknowns x = (C, D, E) for a parabola to go through the five points. No 
solution because no such parabola exists. Solve ATAF = ATb. 

I would predict D = 0. Why should the best parabola be symmetric around £ = 0? 
In ATAF = A'b, equation 2 for D should uncouple from equations 1 and 3. 


Solution The five equations Ax = b have a rectangular “Vandermonde” matrix A: 


C + D(-2) + E(-2) =0 1 —2 4 
C + D(-l) + E(-1)? = 0 1 -1 1 5 0 10 
C+D @M+E OF =1 A=} 1 00 A'A=| 0 10 0 
C+D Q)+E (1% =0 1 1 1 10 0 34 
C+D (+E (2)? =0 1 2 4 


Those zeros in ATA mean that column 2 of A is orthogonal to columns 1 and 3. We see this 
directly in A (the times —2,—1,0, 1,2 are symmetric). The best C, D, E in the parabola 
C + Dt + Et? come from ATAF = A'b, and D is uncoupled: 


5 0 10 C 1 C = 34/70 

0 10 0 D |=| 0 leadsto D =O as predicted 

10 0 34 E 0 E = —10/70 
Problem Set 4.3 


Problems 1-11 use four data points b = (0, 8, 8,20) to bring out the key ideas. 


1 With b = 0,8,8,20 at ¢ = 0,1,3,4, set up and solve the normal equations 
ATA? = A'b. For the best straight line in Figure 4.9a, find its four heights p; 
and four errors e;. What is the minimum value E = e? + e? + e3 + e3? 


2 (Line C + Dt does go through p’s) With b = 0,8, 8,20 at times £ = 0,1,3,4, 
write down the four equations Ax = b (unsolvable). Change the measurements to 
p = 1,5, 13, 17 and find an exact solution to AF = p. 


3 Check that e = b — p = (—1,3,—5,3) is perpendicular to both columns of the 
same matrix A, What is the shortest distance ||e || from b to the column space of A? 


4 (By calculus) Write down E = ||Ax — b||? as a sum of four squares—the last one 
is (C + 4D — 20). Find the derivative equations 9E/8C = 0 and dE/dD = 0. 
Divide by 2 to obtain the normal equations ATAF = ATb. 


5 Find the height C of the best horizontal line to fit b = (0,8,8,20). An exact fit 
would solve the unsolvable equations C = 0, C = 8, C = 8, C = 20. Find the 
4 by 1 matrix A in these equations and solve ATAF = ATb. Draw the horizontal line 
at height x = C and the four errors in e. 
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6 


10 


11 


Project b = (0,8, 8, 20) onto the line through a = (1, 1,1, 1). Find ¥ = a'b/a'a 
and the projection p = xa. Check that e = b — p is perpendicular to a, and find the 
shortest distance ||e || from b to the line through a. 


Find the closest line b = Dt, through the origin, to the same four points. An exact 
fit would solve D -0 = 0, D -1 = 8, D -3 = 8, D+ 4 = 20. Find the 4 by 1 matrix 
and solve ATAF = ATb. Redraw Figure 4.9a showing the best line b = Dt and the 
e’s. 

Project b = (0,8, 8,20) onto the line through a = (0,1,3,4). Find ¥ = D and 
p = Xa. The best C in Problems 5-6 and the best D in Problems 7—8 do not agree 


with the best (C, D) in Problems 1—4. That is because (1, 1, 1, 1) and (0, 1, 3, 4) are 
perpendicular. 


For the closest parabola b = C + Dt + Et? to the same four points, write down the 
unsolvable equations Ax = b in three unknowns x = (C, D, E). Set up the three 
normal equations ATAF = A'S (solution not required). In Figure 4.9a you are now 
fitting a parabola to 4 points—what is happening in Figure 4.9b? 


For the closest cubic b = C + Dt + Et? + Ft? to the same four points, write down 
the four equations Ax = b. Solve them by elimination. In Figure 4.9a this cubic 
now goes exactly through the points. What are p and e? 


The average of the four times is? = i +1+3+4) = 2. The average of the 
four b’s is b = 10 + 8 + 8 + 20) = 9. 


(a) Verify that the best line goes through the center point (f, D) = (2,9). 
(b) Explain why C + Dt = b comes from the first equation in ATAF = ATb. 


b = (0,8, 8, 20) 
e 


N 


‘yp = Ca, + Daz 


az = (0,1,3,4) 
a, = (1,1,1,1) 


Figure 4.9: Problems 1-11: The closest line C + Dt matches Ca; + Da? in R4. 
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Questions 12-16 introduce basic ideas of statistics—the foundation for least squares. 


12 (Recommended) This problem projects b = (b;,...,4m) onto the line through a = 
(1,..., 1). We solve m equations ax = b in 1 unknown (by least squares). 


(a) Solve aTa? = a™b to show that X is the mean (the average) of the b’s. 


(b) Find e = b — aX and the variance |le ||? 


(c) The horizontal line $ = 3 is closest to b = (1, 2,6). Check that p = (3,3, 3) 
is perpendicular to e and find the 3 by 3 projection matrix P. 


and the standard deviation |le ||. 


13 First assumption behind least squares: Ax = b— (noise e with mean zero). Multiply 
the error vectors e = b— Ax by (A™A)—! AT to get X—x on the right. The estimation 
errors ¥ — x also average to zero. The estimate X is unbiased. 


14 Second assumption behind least squares: The m errors e; are independent with vari- 
ance o”, so the average of (b — Ax)(b — Ax)" is oĉ. Multiply on the left by 
(ATA)—!A™ and on the right by A(A™A)~! to show that the average matrix 
(x — x)(x — x)? is o*(A™A)7!. This is the covariance matrix P in section 8.6. 


15 A doctor takes 4 readings of your heart rate. The best solution to x = b1,...,x = b4 
is the average X of b,,...,b4. The matrix A is a column of 1’s. Problem 14 gives 
the expected error (x — x)? as o? (ATAY! = . By averaging, the variance 
drops from o? to o° /4. 


16 If you know the average ¥o of 9 numbers b;,...,49, how can you quickly find the 
average X19 with one more number bio? The idea of recursive least squares is to 
avoid adding 10 numbers, What number multiplies Xo in computing X19? 


Fio = bio + Xo = $ (bı +--+ bio) as in Worked Example 4.2 C. 


Questions 17-24 give more practice with X and p and e. 


17 Write down three equations for the line b = C + Dt to go through b = 7 att = —1, 
b =7att = 1, and 5 = 21 atr = 2. Find the least squares solution ¥ = (C, D) 
and draw the closest line. 


18 Find the projection p = AX in Problem 17. This gives the three heights of the closest 
line. Show that the error vector is e = (2, —6, 4). Why is Pe = 0? 


19 Suppose the measurements at £ = —1,1,2 are the errors 2,—6,4 in Problem 18. 
Compute ¥ and the closest line to these new measurements. Explain the answer: 
b = (2, —6, 4) is perpendicular to so the projection is p = 0. 

20 Suppose the measurements at £ = —1,1,2 are b = (5, 13,17). Compute X and the 


closest line and e. The error is e = 0 because this b is 


21 Which of the four subspaces contains the error vector e? Which contains p? Which 
contains ¥? What is the nullspace of A? 


4.3 
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23 


24 


25 


26 


27 


28 


29 
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Find the best line C + Dt to fit b = 4,2, —1, 0,0 at times £ = —2, —1, 0, 1,2. 


Is the error vector e orthogonal to b or p or e or ¥? Show that {le ||? equals e'b 
which equals b'b — pb. This is the smallest total error £. 


The partial derivatives of || Ax ||? with respect to x;,..., Xn fill the vector 2ATAx. 
The derivatives of 2b" Ax fill the vector 2ATb. So the derivatives of || Ax — b||? are 
zero when 


Challenge Problems 


What condition on (tı, by). (t2, b2). (t3, b3) puts those three points onto a straight 
line? A column space answer is: (b1, b2, b3) must be a combination of (1, 1, 1) and 
(t1, t2, t3). Try to reach a specific equation connecting the #’s and b’s. I should have 
thought of this question sooner! 


Find the plane that gives the best fit to the 4 values b = (0, 1,3, 4) at the corners 
(1, 0) and (0, 1) and (—1, 0) and (0, —1) of a square. The equations C+ Dx+Ey = 
b at those 4 points are Ax = b with 3 unknowns x = (C, D, E). What is A? 
At the center (0, 0) of the square, show that C + Dx + Ey = average of the b’s. 


(Distance between lines) The points P = (x,x,x) and Q = (y,3y,—1) are on two 
lines in space that don’t meet. Choose x and y to minimize the squared distance 
IP — O||?. The line connecting the closest P and Q is perpendicular to . 


Suppose the columns of A are not independent. How could you find a matrix B so 
that P = B(B™B)~!B" does give the projection onto the column space of A? (The 
usual formula will fail when A’ A is not invertible.) 


Usually there will be exactly one hyperplane in R” that contains the n given points 
x = 0,a1,...,@,—1. (Example for n = 3: There will be one plane containing 
0, @ 1, a2 unless .) What is the test to have exactly one plane in R”? 
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4.4 Orthogonal Bases and Gram-Schmidt 


This section has two goals. The first is to see how orthogonality makes it easy to find ¥ and 
p and P. Dot products are zero—so ATA becomes a diagonal matrix. The second goal 
is to construct orthogonal vectors. We will pick combinations of the original vectors to 
produce right angles. Those original vectors are the columns of A, probably not orthogonal. 
The orthogonal vectors will be the columns of a new matrix Q. 

From Chapter 3, a basis consists of independent vectors that span the space. 
The basis vectors could meet at any angle (except 0° and 180°). But every time we visu- 
alize axes, they are perpendicular. In our imagination, the coordinate axes are practically 
always orthogonal. This simplifies the picture and it greatly simplifies the computations. 

The vectors q1,.-.,4, are orthogonal when their dot products q; +g ; are zero. More 
exactly q/q j = Owheneveri # j. With one more step—just divide each vector by its 
length—the vectors become orthogonal unit vectors. Their lengths are all 1. Then the 
basis is called orthonormal. 


The matrix Q is easy to work with because Q* Q = I. This repeats in matrix language 
that the columns g,,...,q,, are orthonormal. Q is not required to be square. 


When row i of QT multiplies column j of Q, the dot product is q7q ;. Off the diagonal 
(i # j) that dot product is zero by orthogonality. On the diagonal (i = j) the unit vectors 
give q!q; = |l¢;||? = 1. Often Q is rectangular (m > n). Sometimes m = n. 


When OQ is square, QTQ = I means that Q'= Q7!: transpose = inverse. 


If the columns are only orthogonal (not unit vectors), dot products still give a diagonal 
matrix (not the identity matrix). But this matrix is almost as good. The important thing is 
orthogonality—then it is easy to produce unit vectors. 
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To repeat: O'Q = I even when Q is rectangular. In that case QT is only an inverse 
from the left. For square matrices we also have QQ? = I, so Q! is the two-sided in- 
verse of Q. The rows of a square Q are orthonormal like the columns. The inverse is the 
transpose. In this square case we call Q an orthogonal matrix.! 

Here are three examples of orthogonal matrices—rotation and permutation and reflec- 
tion. The quickest test is to check QTQ = I. 


Example 1 (Rotation) Q rotates every vector in the e plane clockwise by the > angle 8: 


beep T 


- Q = jes =] i 


sinô cosé] |. 


“cos sind] 
—sinO cosé|'- : : 


The columns of Q are orthogonal (take their dot product). They are unit vectors because 
sin? 0 + cos? @ = 1. Those columns give an orthonormal basis for the plane R*. The 
standard basis vectors i and j are rotated through 6 (see Figure 4.10a). Q7! rotates vectors 
back through —@. It agrees with QT, because the cosine of —@ is the cosine of 0, and 
sin(—0) = — sin 0. We have OT'O = I and QQT = 1. 


Example 2 (Permutation) These matrices change the order to (y, z, x) and (y, x): 


0 1 O|} |x y 
0 0 illyl=lz| ana i fF ]=[2]. 
10 ollz x y 


All columns of these Q’s are unit vectors (their lengths are obviously 1). They are also 
orthogonal (the 1’s appear in different places). The inverse of a permutation matrix is its 
transpose. The inverse puts the components back into their original order: 


I 


00 1 x 0 1 x 
Inverse = transpose: i 0 0 A y and f o| 2 | = | |. 
0 1 ojx z y 


Example 3 (Reflection) If u is any unit vector, set Q = Í — 2uu. Notice that 
uu” is a matrix while uTu is the number ||ul|? = 1. Then OT and Q~! both equal Q: 


and QTỌ = I —4uu" + 4uu'uu" = I. (2) 


Reflection matrices J — 2uu™ are symmetric and also orthogonal. If you square them, you 
get the identity matrix: Q? = QTỌ = 1. Reflecting twice through a mirror brings back 
the original. Notice uu = 1 inside 4uu™uu™ in equation (2). 


'“Orthonormal matrix” would have been a better name for Q, but it’s not used. Any matrix with 
orthonormal columns has the letter Q, but we only call it an orthogonal matrix when it is square. 
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Figure 4.10: Rotation by Q = [$7 s] and reflection across 45° by Q = [? 4]. 


As examples choose two unit vectors, u = (1,0) and then u = (1/V2,—1/¥V2). 
Compute 2uu" (column times row) and subtract from J to get reflections Q; and Q3: 


a-rafenefs J ols a a 


Qı reflects (x,0) across the y axis to (—x,0). Every vector (x, y) goes into its image 
(—x, y), and the y axis is the mirror. Q2 is reflection across the 45° line: 


namo E ED] e EEIE 


When (x, y) goes to (y, x), a vector like (3,3) doesn’t move. It is on the mirror line. 
Figure 4.10b shows the 45° mirror. 

Rotations preserve the length of a vector. So do reflections. So do permutations. So 
does multiplication by any orthogonal matrix—lengths and angles don’t change. 


|x I for every vector x x. = | 


IOl =| 
(2x)"(Qy) = 


= x7QT9 y= =x y: Tost 


Proof ||Qx||? equals ||x]]* because (Ox)'(Ox) = x'O'TOx = xIx = x'x. 
Orthogonal matrices are excellent for computations—numbers can never grow too large 
when lengths of vectors are fixed. Stable computer codes use Q’s as much as possible. 


Projections Using Orthogonal Bases: Q Replaces A 


This chapter is about projections onto subspaces. We developed the equations for ¥ and 
p and the matrix P. When the columns of A were a basis for the subspace, all formulas 
involved ATA. The entries of AT A are the dot products aTa}. 
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Suppose the basis vectors are actually orthonormal. The a’s become g’s. Then ATA 
simplifies to QTỌ = J. Look at the improvements in ¥ and p and P. Instead of QTQ we 
print a blank for the identity matrix: 


__¥=0' and p=OX and P=Q QT. (4) 
The least squares solution of Qx = b is £ = QTb. The projection matrix is P = QQ". 


There are no matrices to invert. This is the point of an orthonormal basis. The best ¥ = 
QTb just has dot products of q,,...,q, with b. We have n 1-dimensional projections! 
The “coupling matrix” or “correlation matrix” ATA is now QTQ = J. There is no cou- 
pling. When A is Q, with orthonormal columns, here is p = Q? = QQ"D: 


Projection 


onto q's DAME a O 


Important case: When Q is square and m = n, the subspace is the whole space. Then 
QT = Q`! and £ = Q"b is the same as x = Q~'b. The solution is exact! The projection 
of b onto the whole space is b itself. In this case P = QQT =I. 

You may think that projection onto the whole space is not worth mentioning. But when 
p = b, our formula assembles b out of its 1-dimensional projections. If g,,...,q, is an 
orthonormal basis for the whole space, so Q is square, then every b = Q QTb is the sum 
of its components along the q’s: 


(6) 


That is Q QT = J. It is the foundation of Fourier series and all the great “transforms” of 
applied mathematics. They break vectors or functions into perpendicular pieces. Then by 
adding the pieces, the inverse transform puts the function back together. 


Example 4 The columns of this orthogonal O are orthonormal vectors ¢1,q42,q3: 


r 2 2 
Q=-| 2-1 2] na OTQ9=QQT=I. 
3) 9 2-4 


The separate projections of b = (0,0, 1) onto q] and q, and q, are p; and p, and p3: 
q1(915) = $41, and q2(q3b) = 3q2 and 93(q3b) = -343 
The sum of the first two is the projection of b onto the plane of q, and gy. The sum of all 


three is the projection of b onto the whole space—which is b itself: 


—24+4-2 0 


Reconstruct 
341 + 342-393 =5| 4-2-2) =] 0) =. 


b = p, + pot P3 


234 Chapter 4. Orthogonality 


The Gram-Schmidt Process 


The point of this section is that “orthogonal is good.” Projections and least squares 
always involve ATA. When this matrix becomes Q'Q = 7, the inverse is no problem. 
The one-dimensional projections are uncoupled. The best x is O'b (just n separate dot 
products). For this to be true, we had to say “Jf the vectors are orthonormal”. 
Now we find a way to create orthonormal vectors. 

Start with three independent vectors a,b,c. We intend to construct three orthogonal 
vectors A, B,C. Then (at the end is easiest) we divide A, B,C by their lengths. That 
produces three orthonormal vectors q; = A/IA], g2 = B/||Bll. 43 = C/C ||. 


Gram-Schmidt Begin by choosing A = a. This first direction is accepted. The next 
direction B must be perpendicular to A. Start with b and subtract its projection along A. 
This leaves the perpendicular part, which is the orthogonal vector B: 


First Gram-Schmidt step (7) 


A and B are orthogonal in Figure 4.11. Take the dot product with A to verify that ATB = 
A'b — A'b = 0. This vector B is what we have called the error vector e, perpendicular 
to A. Notice that B in equation (7) is not zero (otherwise a and b would be dependent). 
The directions A and B are now set. 

The third direction starts with c. This is not a combination of A and B (because ¢ is 
not a combination of a and b). But most likely c is not perpendicular to A and B. So 
subtract off its components in those two directions to get C: 


Next Gram-Schmidt step (8) 


‘ Che. _¢ 
Subtract c 13 = ICi 
projection 

to get B 


l 
i Unit vectors 
l 

K 


yp 
A moa IA] 


Figure 4.11: First project b onto the line through a and find the orthogonal B as b — p. 
Then project c onto the A B plane and find C as ¢ — p. Divide by ||A]], IB |i IIC ||. 
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This is the one and only idea of the Gram-Schmidt process. Subtract from every new 
vector its projections in the directions already set. That idea is repeated at every step.” 
If we had a fourth vector d, we would subtract three projections onto A, B,C to get D. 
At the end, or immediately when each one is found, divide the orthogonal vectors A, B, 
C, D by their lengths. The resulting vectors q1, 42,43,44 are orthonormal. 


Example 5 Suppose the independent non-orthogonal vectors a, b, c are 


1 2 3 
a= |-1 and b= 0 and ¢c=|-3 
0 —2 3 


Then A = a has ATA = 2. Subtract from b its projection along A = (1, —1,0): 


1 
First step B=b-——A=b-3A=| 1 


1 
Next step C=c-——~—A-———B=c-2A4+2B=]1 


Check: C = (1, 1, 1) is perpendicular to A and B. Finally convert A, B,C to unit vectors 
(length 1, orthonormal). The lengths of A, B, C are /2 and /6 and V3. Divide by those 
lengths, for an orthonormal basis: 


1 
l 
qı = —= | -1 and q = —=| 1 and q3 = 
0 


1 
Zl B 


Usually A, B,C contain fractions. Almost always q;,q@2,q3 contain square roots. 


The Factorization A = QR 


We started with a matrix A, whose columns were a,b,c. We ended with a matrix QO, 
whose columns are ¢,,42,q3. How are those matrices related? Since the vectors a, b, c 
are combinations of the qg’s (and vice versa), there must be a third matrix connecting A 
to Q. This third matrix is the triangular R in A = QR. 

The first step was q} = a/|la|| (other vectors not involved). The second step was 
equation (7), where b is a combination of A and B. At that stage C and q, were not 
involved. This non-involvement of later vectors is the key point of Gram-Schmidt: 


21 think Gram had the idea. I don’t really know where Schmidt came in. 
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ə The vectors a and A and q, are all along a single line. 
e The vectors a, b and A, B and q,,q> are all in the same plane. 


e The vectors a, b,c and A, B,C and ¢1,q,q3 are in one subspace (dimension 3). 


At every step a),...,@% are combinations of q,,...,q,;- Later g’s are not involved. 
The connecting matrix R is triangular, and we have A = QR: 


(9) 


A = QR is Gram-Schmidt in a nutshell. Multiply by QT to see why R = OTA. 


Here are the a’s and g’s from the example. The i, j entry of R = QTA is row i of Q7 
times column j of A. This is the dot product of q; with aj: 


1 2 3 1/ VŽ 1/6 1/V3] [V2 v2 VIB 
A=ļ|-1 0 -3/=]-1/¥2 1/¥6 1/s3|| 0 V6 -v6|=QR. 
0-2 3 0 =—2/v6 1/3] Lo 0 V3 


The lengths of A, B,C are the numbers /2, /6, 3 on the diagonal of R. Because of the 
square roots, OR looks less beautiful than LU. Both factorizations are absolutely central 
to calculations in linear algebra. 

Any m by n matrix A with independent columns can be factored into QR. The m by 
n matrix Q has orthonormal columns, and the square matrix R is upper triangular with 
positive diagonal. We must not forget why this is useful for least squares: ATA equals 
RTOTOR = RTR. The least squares equation ATAF = ATb simplifies to Rx = QTb: 


Instead of solving Ax = b, which is impossible, we solve Rx = QTb by back substitu- 
tion—which is very fast. The real cost is the mn? multiplications in the Gram-Schmidt 
process, which are needed to construct the orthogonal Q and the triangular R. 


Below is an informal code. It executes equations (11) and (12), fork = 1 then k = 2 and 
eventually k = n. The last line of that code normalizes to unit vectors q;: 


ee /2 
Divide by length _ may ay a 
g; = unit vector rij = (> v and qij = r for i= 1,...,m. (11) 
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The important lines subtract from v = a; its projection onto each q;: 
m 
kj = Yo ik vi and Vij = Vij — fikřkj- (12) 
i=1 
Starting from a, b, € = a1, a2, a3 this code will construct q,, B, q3, C, q3: 
qı = 41/\lai|| B=a.—(qj42)q, 9 92 = B/||B| 
C* =a3—(qja3)q, C=C*—-(@3C*)q, 93 =C/|IC|l 
Equation (12) subtracts off projections as soon as the new vector q} is found. This 


change to “subtract one projection at a time” is called modified Gram-Schmidt. That is 
numerically more stable than equation (8) which subtracts all projections at once. 


TAS GRIS OAR SL 


for j= lin 
v= AG, j); 
fori = 1:j-1 


RG, j) = QC, iy xv; 

v = v—R(i, j)*Q (i); 
end 
RG, j) = norm(v); 
QC j) = v/RG, J); 


“one 


To recover column j of A, undo the last step and the middle steps of the code: 
j-l 
R(j, j)qj; = (v minus its projections) = (column j of A) — > RG, j)qi - (13) 


i=1 
Moving the sum to the far left, this is column j in the multiplication A = QR. 


Confession Good software like LAPACK, used in good systems like MATLAB and 
Octave and Python, will not use this Gram-Schmidt code. There is now a better way. 
“Householder reflections” produce the upper triangular R, one column at a time, exactly as 
elimination produces the upper triangular U . 

Those reflection matrices 7 — 2uu™ will be described in Chapter 9 on numerical linear 
algebra. If A is tridiagonal we can simplify even more to use 2 by 2 rotations. The result 
is always A = QR and the MATLAB command is [Q, R] = qr(A). I believe that Gram- 
Schmidt is still the good process to understand, even if the reflections or rotations lead to a 
more perfect Q. 


238 Chapter 4. Orthogonality 


u REVIEW OF THE KEY IDEAS u 
1. If the orthonormal vectors qg,,...,q, are the columns of Q, then qq j; = Oand 
4:4; = 1 translate into QTQ = I. 
. If Q is square (an orthogonal matrix) then QT = Q7!: transpose = inverse. 
. The length of Q x equals the length of x: ||Qx|| = |x|]. 
. The projection onto the column space spanned by the g’s is P = QQ". 


. If Q is square then P = J and every b = q,(q'b) +--+ qa (q1b). 


nH vA A WH N 


. Gram-Schmidt produces orthonormal vectors ¢1,45,q3 from independent a, b,c. 
In matrix form this is the factorization A = QR = (orthogonal Q)(triangular R). 


= WORKED EXAMPLES =" 


4.4 A Add two more columns with all entries I or —1, so the columns of this 4 by 4 
“Hadamard matrix” are orthogonal. How do you turn H4 into an orthogonal matrix Q? 


1 l x x 

ji if l -l x x 
m=|; H Ma=)) ixx and @a= 

l -l x x 


The block matrix He = É HE is the next Hadamard matrix with 1’s and —1’s. 


H4 —H4| What is the product Hj Hg? 
The projection of b = (6,0,0,2) onto the first column of H4 is p, = (2,2,2,2). The 


projection onto the second column is py = (1,—1,1,—1). What is the projection p, 2 of 
b onto the 2-dimensional space spanned by the first two columns? 


Solution H4 can be built from H3 just as Hg is built from Ha: 


1 1 1 1 

_|H2 HA2}_f}1 —-1 1 -i 
H, = | H, — H =] 1-1-1 has orthogonal columns. 

l1 —1 -1 I 


Then Q = H/2 has orthonormal columns. Dividing by 2 gives unit vectors in Q. Orthog- 
onality for 5 by 5 is impossible because the dot product of columns would have five 1’s 
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and/or —1’s and could not add to zero. Hg has orthogonal columns of length /8. 


yin, -|E HT] [H_ H] _[|2H"H 0 {87 0 0, = “8 

eee | HT -H™||H -H|~ | 0 2HTH) |0 81) ~8 RB 
Key point of orthogonal columns: We can project (6,0,0,2) onto (1,1,1,1) and 
(1,—1,1,—1) and add. There is no ATA matrix to invert: 


Add p’s Projection py > = Py + Po = (2,2,2,2) + (1,—1,1,—1) = (3, 1,3, 1). 


Check that columns a, and a2 of H are perpendicular to the error e = b — pı — po: 


ajb 2b T T aib r T 
e = b-———a, — -42 and aje =a,b———a,a,;=0 andalso aze =0. 


SO pı + p> is in the space of a; and ao, and its error e is perpendicular to that space. 

The Gram-Schmidt process on those orthogonal columns a, and a2 would not change 
their directions. It would only divide by their lengths. But if a, and az are not orthogonal, 
the projection p; > is not generally pı + pz. For example, if b is the same as a, then 
Pı = b and p; = b but p, £9. 


Problem Set 4.4 


Problems 1-12 are about orthogonal vectors and orthogonal matrices. 


1 Are these pairs of vectors orthonormal or only orthogonal or only independent? 


1 —1 6 4 cos 6 —sin ð 
(a) fo] and | i (b) [$] and -3l (c) ine | and | cos 4 ' 
Change the second vector when necessary to produce orthonormal vectors. 


2 The vectors (2,2, —1) and (—1, 2, 2) are orthogonal. Divide them by their lengths to 
find orthonormal vectors g, and q3. Put those into the columns of Q and multiply 


QTQ and Q QT. 
3 (a) If A has three orthogonal columns each of length 4, what is ATA? 
(b) If A has three orthogonal columns of lengths 1,2,3, what is ATA? 


4 Give an example of each of the following: 


(a) A matrix Q that has orthonormal columns but Q QT Æ I. 
(b) Two orthogonal vectors that are not linearly independent. 
(c) An orthonormal basis for R?, including the vector g, = (1, 1, 1)/ v3. 


5 Find two orthogonal vectors in the plane x + y + 2z = 0. Make them orthonormal. 
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If Qı and Qz are orthogonal matrices, show that their product Q; Q2 is also an 
orthogonal matrix. (Use QTQ = 1.) 


If Q has orthonormal columns, what is the least squares solution ¥ to Qx = b? 


If q, and q, are orthonormal vectors in R, what combination 
is closest to a given vector b? 


qı + q2 


(a) Compute P = Q QT when gq, = (.8,.6,0) and qa = (—.6, .8,0). Verify that 
P? =P. 


(b) Prove that always (Q QT)? = Q QT by using QTQ = I. Then P = QQ" is 

the projection matrix onto the column space of Q. 
Orthonormal vectors are automatically linearly independent. 

(a) Vector proof: When c1q,+¢2¢2+c3q3 = 9, what dot product leads toc; = 0? 
Similarly c2 = 0 and c3 = 0. Thus the q’s are independent. 

(b) Matrix proof: Show that Qx = 0 leads to x = 0. Since Q may be rectangular, 
you can use QT but not Q7!. 

(a) Gram-Schmidt: Find orthonormal vectors q; and q, in the plane spanned by 
a = (1,3,4,5,7) and b = (—6, 6, 8,0, 8). 

(b) Which vector in this plane is closest to (1,0, 0, 0,0)? 


If a), a2, a3 is a basis for R?, any vector b can be written as 
b = xiđı + X2a2 + X343 or Mm a2 a3 x2 | =b. 


(a) Suppose the a’s are orthonormal. Show that x; = aTb. 
(b) Suppose the a’s are orthogonal. Show that x; = atb/atay. 


(c) If the a’s are independent, x, is the first component of times b. 


Problems 13-25 are about the Gram-Schmidt process and A = QR. 


13 


14 


What multiple of a = [1] should be subtracted from b = [4] to make the result B 
orthogonal to a? Sketch a figure to show a, b, and B. 


Complete the Gram-Schmidt process in Problem 13 by computing q; = @/||a|| and 
qə = B/||B || and factoring into OR: 


[i o| = [a a]! vai 
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15 


16 


17 


18 


19 


20 


21 
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(a) Find orthonormal vectors q1, q2, q3 such that q\, g span the column space of 


1 
A=| 2 1 
—2 4 


(b) Which of the four fundamental subspaces contains q}? 


(c) Solve Ax = (1,2, 7) by least squares. 


What multiple of a = (4,5,2,2) is closest to b = (1,2,0,0)? Find orthonormal 
vectors g; and q, in the plane of a and b. 


Find the projection of b onto the line through a: 
l 
a={1 and b= | 3 and p=? and e=b-—p=? 
5 


Compute the orthonormal vectors q, = a/|la|| and q> = e/|le|l. 

(Recommended) Find orthogonal vectors A, B,C by Gram-Schmidt from a, b,c: 
a = (1,—1,0,0) b = (0, 1,—1,0) ce = (0,0, 1,—1). 

A,B,C anda, b,c are bases for the vectors perpendicular to d = (1,1, 1,1). 

If A = OR then ATA = R'R = __ _ triangular times triangular. 


Gram-Schmidt on A corresponds to elimination on ATA. The pivots for ATA must 
be the squares of diagonal entries of R. Find Q and R by Gram-Schmidt for this A: 


-~l1 1 
7 r, [9 9]_f1 f9 I1 
A= > and 4=|) ig|~|1 1 ollo at 


True or false (give an example in either case): 


(a) Q`! is an orthogonal matrix when Q is an orthogonal matrix. 


(b) If Q (3 by 2) has orthonormal columns then || Qx || always equals ||x |]. 


Find an orthonormal basis for the column space of A: 


1 —2 —4 
1 0 —3 
A= l l and b= 3 
1 3 0 


Then compute the projection of b onto that column space. 
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22 Find orthogonal vectors A, B,C by Gram-Schmidt from 


1 I 
a=ji1 and b= | -1 and c=1|0 
2 0 4 


23 Find ¢1,9¢2,q3 (orthonormal) as combinations of a,b,c (independent columns). 
Then write A as OR: 


1 2 4 
A=|0 0 5 
0 3 6 
24 (a) Find a basis for the subspace S in R* spanned by all solutions of 
Xy + x2 + X3 — X4 = 0. 
(b) Find a basis for the orthogonal complement S+. 
(c) Find bı in S and b2 in S+ so that by + b2 = b = (1, 1,1, 1). 
25 Ifad — bc > 0, the entries in A = QR are 
a —c|[a*+c? ab+cd 
f A |c a 0 ad —bc 
c d] Jape? Va? +c? l 


Write A = QR when a,b,c,d = 2,1,1,1 and also 1,1,1,1. Which entry of R 
becomes zero when the columns are dependent and Gram-Schmidt breaks down? 


Problems 26-29 use the QR code in equations (11-12). It executes Gram-Schmidt. 
26 Show why C (found via C * in the steps after (12)) is equal to C in equation (8). 


27 Equation (8) subtracts from ¢ its components along A and B. Why not subtract the 
components along a and along 5? 


28 Where are the mn? multiplications in equations (11) and (12)? 


29 Apply the MATLAB qr code toa = (2,2,—1), b = (0, —3,3),c = (1,0,0). What 
are the q’s? 


Problems 30-35 involve orthogonal matrices that are special. 


30 The first four wavelets are in the columns of this wavelet matrix W: 


I 1 2 0 
waili 1 -v2 0 
~ Oo} -1 0 2 
1 -1 0 -/2 


What is special about the columns? Find the inverse wavelet transform W~}, 
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31 


32 


33 
34 


35 


36 


37 


(a) Choose c so that Q is an orthogonal matrix: 


1-1 -1 -1 
-~1 1-1-1 
Q=c) 5 a 1 
-1 -1 -1 1 


Project 6 = (1,1, 1,1) onto the first column. Then project b onto the plane of the 
first two columns. 


If u is a unit vector, then Q = I — 2uu" is a reflection matrix (Example 3). Find Q1 
from u = (0,1) and Q3 from u = (0, /2/2, /2/2). Draw the reflections when QO, 
and Q2 multiply the vectors (1, 2) and (1, 1, 1). 


Find all matrices that are both orthogonal and lower triangular. 


QO = I —2uu’ is a reflection matrix when u'u = 1. Two reflections give Q? = 1. 


(a) Show that Qu = —u. The mirror is perpendicular to u. 


(b) Find Qv when u™y = 0. The mirror contains v. It reflects to itself. 


Challenge Problems 


(MATLAB) Factor [Q,R] = qr(A) for A = eye(4) — diag([1 1 1],-—1). You 
are orthogonalizing the columns (1, —1, 0,0) and (0, 1,—1,0) and (0,0, 1, —1) and 
(0,0,0,1) of A. Can you scale the orthogonal columns of Q to get nice integer 
components? 


If A is m by n with rank n, qr(A) produces a square Q and zeros below R: 
R 
The factors from MATLAB are (m by m)(m by n) A=[01 Qh] | al . 
The n columns of Qı are an orthonormal basis for which fundamental subspace? 


The m—n columns of Q; are an orthonormal basis for which fundamental subspace? 


We know that P = QQT is the projection onto the column space of QO(m by n). 
Now add another column a to produce A = [Q a]. What is the new orthonormal 
vector q from Gram-Schmidt: start with a, subtract , divide by . 


Chapter 5 


Determinants 


5.1 The Properties of Determinants 


The determinant of a square matrix is a single number. That number contains an amazing 
amount of information about the matrix. It tells immediately whether the matrix is invert- 
ible. The determinant is zero when the matrix has no inverse. When A is invertible, the 
determinant of A~! is 1/(det A). If det A = 2 then det A"! = Ł, In fact the determinant 
leads to a formula for every entry in A7!. 

This is one use for determinants—to find formulas for inverse matrices and pivots and 
solutions A~!b. For a large matrix we seldom use those formulas, because elimination is 
faster. For a 2 by 2 matrix with entries a, b, c, d, its determinant ad — be shows how A7! 
changes as A changes: 


_ a b . -1 _ l d —b 
=li A has inverse A -zr Al (1) 


Multiply those matrices to get I. When the determinant is ad — bc = 0, we are asked to 
divide by zero and we can’t—then A has no inverse. (The rows are parallel when a/c = 
b/d. This gives ad = bc and det A = 0). Dependent rows always lead to det A = 0. 

The determinant is also connected to the pivots. For a 2 by 2 matrix the pivots are a 
and d — (c/a)b. The product of the pivots is the determinant: 


Product of pivots a(d — <b) =ad—bc whichis det A. 


After a row exchange the pivots change to c and b — (a/c)d. Those new pivots multiply to 
give bc — ad. The row exchange to | £ ¢ | reversed the sign of the determinant. 
Looking ahead The determinant of an n by n matrix can be found in three ways: 


1 Multiply the n pivots (times 1 or —1) This is the pivot formula. 
2 Add up n! terms (times 1 or —1) This is the “big” formula. 
3 Combine n smaller determinants (times 1 or —1) This is the cofactor formula. 
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You see that plus or minus signs—the decisions between 1 and —1—play a big part in 
determinants. That comes from the following rule for n by n matrices: 


The determinant changes sign when two rows (or two columns) are exchanged. 


The identity matrix has determinant +1. Exchange two rows and det P = —1. Exchange 
two more rows and the new permutation has det P = +1. Half of all permutations are 
even (det P = 1) and half are odd (det P = —1). Starting from /, half of the P’s involve 
an even number of exchanges and half require an odd number. In the 2 by 2 case, ad has a 
plus sign and bc has minus—coming from the row exchange: 


1 0 0 1 
aet |=! and aer | o|=-h 


The other essential rule is linearity—but a warning comes first. Linearity does not mean 
that det(A + B) = det A+det B. This is absolutely false. That kind of linearity is not even 
true when A = Z and B = J. The false rule would say that det(J + J) = 1 +1 = 2. The 
true rule is det27 = 2”. Determinants are multiplied by 2” (not just by 2) when matrices 
are multiplied by 2. 

We don’t intend to define the determinant by its formulas. It is better to start with 
its properties—sign reversal and linearity. The properties are simple (Section 5.1). They 
prepare for the formulas (Section 5.2). Then come the applications, including these three: 


(1) Determinants give AT! and A~*b (this formula is called Cramer’s Rule). 
(2) When the edges of a box are the rows of A, the volume is | det A|. 


(3) For n special numbers À, called eigenvalues, the determinants of A — AJ is zero. 
This is a truly important application and it fills Chapter 6. 


The Properties of the Determinant 


Determinants have three basic properties (rules 1, 2, 3). By using those rules we can 
compute the determinant of any square matrix A. This number is written in two ways, 
det A and |A|. Notice: Brackets for the matrix, straight bars for its determinant. When A 
is a 2 by 2 matrix, the three properties lead to the answer we expect: 


a b 


c d 


The determinant of 5 A is 
c d 


|= aa — be. 


The last rules are det(A B) = (det A) (det B) and det AT = det A. We will check all rules 
with the 2 by 2 formula, but do not forget: The rules apply to any n by n matrix. We will 
show how rules 4 — 10 always follow from 1 — 3. 

Rule 1 (the easiest) matches det J = 1 with the volume = 1 for a unit cube. 
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1 The determinant of the n by n identity matrix is 1. 


|=: and ES =]. 
1 


2 The determinant changes sign when two rows are exchanged (sign reversal): 


c d a b 
b d 


Check: (both sides equal be — ad). 


Because of this rule, we can find det P for any permutation matrix. Just exchange rows 
of I until you reach P. Then det P = +1 for an even number of row exchanges and 
det P = —1 for an odd number. 

The third rule has to make the big jump to the determinants of ail matrices. 


3 The determinant is a linear function of each row separately (all other rows stay fixed). 
If the first row is multiplied by ft, the determinant is multiplied by ¢. If first rows are added, 
determinants are added. This rule only applies when the other rows do not change! Notice 
how c and d stay the same: 


Jota’ b+b'|_]a b|, i 
Hf e d |} ic d = 


In the first case, both sides are tad — tbc. Then ¢ factors out. In the second case, both sides 
are ad + a’d — bc — b'e. These rules still apply when A is z by n, and the last n — 1 rows 
don’t change. May we emphasize rule 3 with numbers: 


4 8 8 1 2 2 4 8 8 4 0 0 0 8 8 
O 1 1);=4;0 1 1| and JO 1 1;=)0 1 1;4+j0 1 ij. 
0 0 | 0 O 1 00 1 00 1 00 1 


By itself, rule 3 does not say what those determinants are (the first one is 4). 

Combining multiplication and addition, we get any linear combination in one row 
(the other rows must stay the same). Any row can be the one that changes, since rule 2 
for row exchanges can put it up into the first row and back again. 

This rule does not mean that det27 = 2det 7. To obtain 2] we have to multiply both 
rows by 2, and the factor 2 comes out both times: 


2 0 


— 92 — 
0 | = 2 =4 and 


0 lT 


This is just like area and volume. Expand a rectangle by 2 and its area increases by 4. 
Expand an n-dimensional box by ¢ and its volume increases by t”. The connection is no 
accident—we will see how determinants equal volumes. 
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Pay special attention to rules 1-3. They completely determine the number det A. We 
could stop here to find a formula for n by n determinants. (a little complicated) We prefer 
to go gradually, with other properties that follow directly from the first three. These extra 
rules 4 — 10 make determinants much easier to work with. 


4 Iftworows of A are equal, then det A = 0. 


a b 


Equal rows Check 2 by 2: b 


|=0 


Rule 4 follows from rule 2. (Remember we must use the rules and not the 2 by 2 formula.) 
Exchange the two equal rows. The determinant D is supposed to change sign. But also D 
has to stay the same, because the matrix is not changed. The only number with -D = D 
is D = 0—this must be the determinant. (Note: In Boolean algebra the reasoning fails, 
because —1 = 1. Then D is defined by rules 1, 3, 4.) 

A matrix with two equal rows has no inverse. Rule 4 makes det A = 0. But matrices 
can be singular and determinants can be zero without having equal rows! Rule 5 will be 
the key. We can do row operations without changing det A. 


5 Subtracting a multiple of one row from another row leaves det A unchanged. 


£ times row 1 
from row 2 


Rule 3 (linearity) splits the left side into the right side plus another term —£ | aby. 
This extra term is zero by rule 4. Therefore rule 5 is correct (not just 2 by 2). 


Conclusion The determinant is not changed by the usual elimination steps from A to U. 
Thus det A equals detU. If we can find determinants of triangular matrices U, we can 
find determinants of all matrices A. Every row exchange reverses the sign, so always 
det A = + det U. Rule 5 has narrowed the problem to triangular matrices. 


6 A matrix with a row of zeros has det A = 0. 


4 


0 0 
c d 


a b 


Row of zeros 0 0 


[=o and 


| =0. 
For an easy proof, add some other row to the zero row. The determinant is not changed 
(rule 5). But the matrix now has two equal rows. So det A = 0 by rule 4. 


7 If A is triangular then det A = 411823 +++ ann = product of diagonal entries. 


a b 


Triangular 0d 


| =ad and also f 


Suppose all diagonal entries of A are nonzero. Eliminate the off-diagonal entries by the 
usual steps. (If A is lower triangular, subtract multiples of each row from lower rows. If A 
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is upper triangular, subtract from higher rows.) By rule 5 the determinant is not changed— 
and now the matrix is diagonal: 


Diagonal matrix det i = 411822 `- Ann- 
0 ann 


Factor a11 from the first row by rule 3. Then factor a22 from the second row. Eventually 
factor ann from the last row. The determinant is a; times a22 times --- times apn times 
det J. Then rule 1 (used at last!) is det 7 = 1. 

What if a diagonal entry a;; is zero? Then the triangular A is singular. Elimination 
produces a zero row. By rule 5 the determinant is unchanged, and by rule 6 a zero row 
means det A = 0. Triangular matrices have easy determinants. 


8 If A is singular then det A = 0. If A is invertible then det A # 0. 


Singular f al is singular if and only if ad — bc = Q. 
Proof Elimination goes from A to U. If A is singular then U has a zero row. The rules 
give det A = det U = 0. If A is invertible then U has the pivots along its diagonal. The 
product of nonzero pivots (using rule 7) gives a nonzero determinant: 


det A = + det U = + (product of the pivots). 


The pivots of a 2 by 2 matrix (if a # 0) are a and d — (bc /a): 


a b 
0 d—(bc/a) | — ad ~ be. 


The determinant is |? b |= 
c d 


This is the first formula for the determinant. MATLAB uses it to find det A from the 
pivots. The sign in + detu depends on whether the number of row exchanges is even 
or odd. In other words, +1 or —1 is the determinant of the permutation matrix P that 
exchanges rows. With no row exchanges, the number zero is even and P = J anddet A = 
det U = product of pivots. Always detL = 1, because L is triangular with 1’s on the 
diagonal. What we have is this: 


If PA=LU then detP detA = det L detU. (3) 


Again, det P = +1 and det A = + det U. Equation (3) is our first case of rule 9. 
9 The determinant of AB is det A times det B: |AB| = |A] |B|. 


a b 
c d 


P q 
r sS 


ap+br aq+bs 
cp+dr cqtds|° 


Product rule 
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When the matrix B is AT}, this rule says that the determinant of A`! is 1/ det A: 


A times A-! “so, (det A)(det A~!) = det I = 1... 


This product rule is the most intricate so far. Even the 2 by 2 case needs some algebra: 
|A] |B| = (ad — bc)(ps — qr) = (ap + br)(cq + ds) — (aq + bs)(cp + dr) = |AB|. 


For the n by n case, here is a snappy proof that |AB| = |A||B|. When |B] is not zero, 
consider the ratio D(A) = |AB|/|B|. Check that this ratio has properties 1,2,3. Then 
D(A) has to be the determinant and we have |A| = |AB|/|B|: good. 


Property 1 (Determinant of I) If A = I then the ratio becomes |B|/|B| = 1. 


Property 2 (Sign reversal) When two rows of A are exchanged, so are the same two 
rows of AB. Therefore |AB| changes sign and so does the ratio |AB|/|B|. 


Property 3 (Linearity) When row 1 of A is multiplied by ¢, so is row 1 of AB. This 
multiplies |A B| by ¢ and multiplies the ratio by t—as desired. 
Add row 1 of A to row 1 of A’. Then row 1 of AB adds to row 1 of A'B. 
By rule 3, determinants add. After dividing by |B|, the ratios add—as desired. 


Conclusion This ratio |AB|/|B| has the same three properties that define |A|. Therefore 
it equals |A|. This proves the product rule |AB| = |A| |B|. The case |B| = O is separate 
and easy, because AB is singular when B is singular. Then |AB| = |A||B| is 0 = 0. 


10 The transpose AT has the same determinant as A. 


a b a ec . . 
Transpose f di=\b a| Sinee both sides equal ad — be. 
The equation |A™| = |A| becomes 0 = 0 when A is singular (we know that A? is also 


singular). Otherwise A has the usual factorization PA = LU. Transposing both sides 
gives ATPT = U'LT, The proof of |A| = |A™| comes by using rule 9 for products: 


Compare detP det A =detLdetU with det A’ det PT = det UT det LT. 


First, det L = det LT = 1 (both have 1’s on the diagonal). Second, det U = det UT (those 
triangular matrices have the same diagonal). Third, det P = det PT (permutations have 
PTP = 1, so |P*||P| = 1 by rule 9; thus | P| and | P7| both equal 1 or both equal —1). 
So L, U, P have the same determinants as LT, UT, PT and this leaves det A = det AT. 


Important comment on columns Every rule for the rows can apply to the columns (just 
by transposing, since |A| = |A"™|). The determinant changes sign when two columns are 
exchanged. A zero column or two equal columns will make the determinant zero. If a 
column is multiplied by ¢, so is the determinant. The determinant is a linear function of 
each column separately. 

It is time to stop. The list of properties is long enough. Next we find and use an explicit 
formula for the determinant. 
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= REVIEW OF THE KEY IDEAS =u 


1. The determinant is defined by det J = 1, sign reversal, and linearity in each row. 
2. After elimination det A is + (product of the pivots). 
3. The determinant is zero exactly when A is not invertible. 


4. Two remarkable properties are det AB = (det A)(det B) and det A? = det A. 


m WORKED EXAMPLES =€" 


5.1A Apply these operations to A and find the determinants of Mı, M2, M3, Ma: 
In Mj, multiplying each a;; by (—1)'t+/ gives a checkerboard sign pattern. 
In M2, rows 1,2,3 of A are subtracted from rows 2, 3, 1. 
In M3, rows 1, 2,3 of A are added to rows 2, 3, 1. 

How are the determinants of Mı, M2, M3 related to the determinant of A? 


4114 —@i2 443 row 1 — row 3 row | + row 3 
—4äz2) 2 23 row 2 — row 1 row 2 + row 1 
a3, ~—a32 433 row 3 — row 2 row 3 + row 2 


Solution The three determinants are det A, 0, and 2 det A. Here are reasons: 


1 ail 412 413 1 
Mı = —l a2) a22 a3 —1 so det Mı = (—1)(det A)(—1). 
1 | | @31 432 433 J 


Mp is singular because its rows add to the zero row. Its determinant is zero. 
M3 can be split into eight matrices by Rule 3 (linearity in each row seperately): 


row 1 + row 3 row 1 row 3 row 1 row 3 
row2+rowl | =| row2}+]|row2 |+| rowl |+---+ | row 1 
row 3 + row 3 row 3 row 3 row 3 row 2 


All but the first and last have repeated rows and zero determinant. The first is A and the 
last has two row exchanges. So det M3 = det A + det A. (Try A = 1.) 
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5.1B Explain how to reach this determinant by row operations: 


l-a 1 1 
det l I-a il =a’*(3—a). (4) 
1 I i-a 


Solution Subtract row 3 from row 1 and then from row 2. This leaves 


—a 0 a 
det 0 —a a 
l 1 l-a 


Now add column 1 to column 3, and also column 2 to column 3. This leaves a lower 
triangular matrix with —a, —a, 3 — a on the diagonal: det = (—a)(—a)(3 — a). 

The determinant is zero if a = 0 ora = 3. Fora = 0 we have the all-ones matrix— 
certainly singular. For a = 3, each row adds to zero - again singular. Those numbers 0 
and 3 are the eigenvalues of the all-ones matrix. This example is revealing and important, 
leading toward Chapter 6. 


Problem Set 5.1 


Questions 1-12 are about the rules for determinants. 
1 Ifa 4 by 4 matrix has det A = 4, find det(2A) and det(— A) and det(A) and det(A7?). 


2 If a3 by 3 matrix has detA = —1, find det($A) and det(—A) and det(A”) and 
det(A-!). 


3 True or false, with a reason if true or a counterexample if false: 


(a) The determinant of J + A is 1 + det A. 
(b) The determinant of ABC is |A| |B] IC]. 
(c) The determinant of 4A is 4| A]. 


(d) The determinant of AB — BA is zero. Try an example with A = o A | . 


4 Which row exchanges show that these “reverse identity matrices” J3 and J4 have 
|J3| = —1 but |J4| = +1? 


0 0 1 
det} 0 1 0| =—~i but det 
1 0 90 


= OOO 
om OO 
oor © 
ooo rf 


5 For n = 5,6,7, count the row exchanges to permute the reverse identity J, to the 
identity matrix J,. Propose a rule for every size n and predict whether J101 has 
determinant +1 or —1. 
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6 Show how Rule 6 (determinant = 0 if a row is all zero) comes from Rule 3. 


7 Find the determinants of rotations and reflections: 


0 = cos@ —sin@ J _[ 1—2cos?@ —2cosé@ sind 
— | sinf  cos@ and Q= —2cos@sinOd  1—2sin? 8 |' 


8 Prove that every orthogonal matrix (Q™Q = /) has determinant 1 or —1. 


(a) Use the product rule |AB| = |A| |B| and the transpose rule |Q] = |Q7|. 


(b) Use only the product rule. If | det Q| > 1 then det Q” = (det Q)” blows up. 
How do you know this can’t happen to Q”? 


9 Do these matrices have determinant 0, 1,2, or 3? 


0 0 1 0 1 1 l 1 1I 
A={1 0 0 B=|1 0 1 c=]1 1 1 
O 1 0 1 1 0 l 


10 Ifthe entries in every row of A add to zero, solve Ax = 0 to prove det A = 0. If 
those entries add to one, show that det(A — J) = 0. Does this mean det A = 1? 


11 Suppose that CD = —DC and find the flaw in this reasoning: Taking determinants 
gives |C||D] = —|D||C|. Therefore |C| = 0 or |D| = 0. One or both of the 
matrices must be singular. (That is not true.) 


12 The inverse of a 2 by 2 matrix seems to have determinant = 1: 


1 É DJ-k 


~ ad-be 


-i _— 
det A = det The ece a 


What is wrong with this calculation? What is the correct det A71? 
Questions 13-27 use the rules to compute specific determinants. 
13 Reduce A to U and find det A = product of the pivots: 
1 1 1 1 2 3 
A=|{1 2 2 A=|2 2 3 
1 2 3 3 3 3 


14 By applying row operations to produce an upper triangular U , compute 


12 3 0 2 -1 0 0 

2 6 6 1 -l 2-1 0 

det 100 3 and det 0 —] 2 1] 
2 0 7 0 0 —i 2 
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15 


16 


17 


18 


19 


20 


21 


22 


Use row operations to simplify and compute these determinants: 


101 201 301 1 ¢t ¢ 
det | 102 202 302 and det} ¢ I t 
103 203 303  t 1l 


Find the determinants of a rank one matrix and a skew-symmetric matrix: 


l 0 1 3 
A=|2]|[{1 -4 5| and K=]-1 0 4 
3 -3 —4 0 
A skew-symmetric matrix has KT = —K. Insert a,b,c for 1,3,4 in Question 16 


and show that | K| = 0. Write down a 4 by 4 example with | K| = 1. 


Use row operations to show that the 3 by 3 “Vandermonde determinant” is 


l a a? 
det} 1 b b? | =(b—a)(c -ac — b). 
lc e 


Find the determinants of U and UT! and U?: 


1 4 6 p 
U=|]0 2 5 and u=(6 |. 
0 0 3 


Suppose you do two row operations at once, going from 

a b t a— Le b-—Ld 

c d ° e-la d-—Ib|’ 
Find the second determinant. Does it equal ad — bc? 


Row exchange: Add row 1 of A to row 2, then subtract row 2 from row 1. Then add 
row 1 to row 2 and multiply row 1 by —1 to reach B. Which rules show 


a b 


2 
d 


c d 
det B =| e 5 | equals — det ÁA = — 


Those rules could replace Rule 2 in the definition of the determinant. 


From ad — bc, find the determinants of A and AT! and A — ÀT: 


_f2 1 _»_1f 2-1 _f2-a 1 
aof I] ma ttf 2 2] aw are fF? t] 


Which two numbers À lead to det(A — AJ) = 0? Write down the matrix A — AJ for 
each of those numbers A—it should not be invertible. 
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25 
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From A = [44] find A? and AT! and A — AJ and their determinants. Which two 
numbers A lead to det(A — AJ) = 0? 
Elimination reduces A to U. Then A = LU: 
3 3 4 1 0 O;]3 3 4 
A=} 6 8 7|ļ|=|2 1 O;F;0 2 -1) =LU. 
—3 5 -9 -1 4 14/0 0 -i 
Find the determinants of L, U, A, UTIL}, and UTI LTH A. 
If the i, j entry of A isi times j, show that det A = 0. (Exception when A = [1].) 
If the i, j entry of A isi + 7, show that det A = 0. (Exception when n = 1 or 2.) 


Compute the determinants of these matrices by row operations: 


Q0 a OO 
0 a 0 005 0 aaa 
A=;0 0 b and B= and C=!/a b b 
0 0 000c abc 

C d 000 


True or false (give a reason if true or a 2 by 2 example if false): 


(a) If A is not invertible then AB is not invertible. 

(b) The determinant of A is always the product of its pivots. 
(c) The determinant of A — B equals det A — det B. 

(d) AB and BA have the same determinant. 


What is wrong with this proof that projection matrices have det P = 1? 


l 
P=A(A'A)'A'™ so P| = |Al|———_|A? | = 1. 


(Calculus question) Show that the partial derivatives of In(det A) give A7!! 


af/db Əðf/ðd 


(MATLAB) The Hilbert matrix hilb(x) has i, 7 entry equal to 1/(i + j — 1). Print 
the determinants of hilb(1), hilb(2), ..., hilb(10). Hilbert matrices are hard to work 
with! What are the pivots of hilb (5)? 


f(a,b,c,d) =\n(ad —be) leads to Ee HA = Aq}, 


(MATLAB) What is a typical determinant (experimentally) of rand (7) and randn(n) 
for n = 50, 100, 200, 400? (And what does “Inf” mean in MATLAB?) 


(MATLAB) Find the largest determinant of a 6 by 6 matrix of 1’s and —1’s. 
If you know that det A = 6, what is the determinant of B? 


row 1 row 3 + row 2+ row 1 
From det A = |row 2| = 6 find det B = row 2 + row 1 
row 3 row 1 
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5.2 Permutations and Cofactors 


A computer finds the determinant from the pivots. This section explains two other ways 
to do it. There is a “big formula” using all n! permutations. There is a “cofactor formula” 
using determinants of size n — 1. The best example is my favorite 4 by 4 matrix: 


2-1 0 0 
-l1 2—1 0 
A= 0-1 2-1 has detA=5. 
0 O-1l 2 
We can find this determinant in all three ways: pivots, big formula, cofactors. 
1. The product of the pivots is 2- 3 - $ - 3. Cancellation produces 5. 


2. The “big formula” in equation (8) has 4! = 24 terms. Only five terms are nonzero: 
det A = 16—4-—-4~—44+1=5. 


The 16 comes from 2+ 2 » 2 . 2 on the diagonal of A. Where do —4 and +1 come 
from? When you can find those five terms, you have understood formula (8). 


3. The numbers 2,—1,0,0 in the first row multiply their cofactors 4,3,2, 1 from the 
other rows. That gives 2-4 — 1-3 = 5. Those cofactors are 3 by 3 determinants. 
Cofactors use the rows and columns that are not used by the entry in the first row. 
Every term in a determinant uses each row and column once! 


The Pivot Formula 


Elimination leaves the pivots di, . . ., dn on the diagonal of the upper triangular U. If no 
row exchanges are involved, multiply those pivots to find the determinant: 
det A = (det L)(detU) = (1)(d,d2++-dn). (1) 


This formula for det A appeared in the previous section, with the further possibility of row 
exchanges. The permutation matrix in PA = LU has determinant —1 or +1. This factor 
det P = +1 enters the determinant of A: 


detA =t(didy--dy). O 


When A has fewer than n pivots, det A = 0 by Rule 8. The matrix is singular. 


Example 1 A row exchange produces pivots 4, 2, 1 and that important minus sign: 


00 1 45 6 
A=|]0 2 3 PA=|0 2 3 det A = —(4)(2)(1) = —8. 
4 5 6 0 0 1 
The odd number of row exchanges (namely one exchange) means that det P = —1. 


The next example has no row exchanges. It may be the first matrix we factored into 
LU (when it was 3 by 3). What is remarkable is that we can go directly to n by n. Pivots 
give the determinant. We will also see how determinants give the pivots. 
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3 


Example 2 The first pivots of this tridiagonal matrix A are 2, 5, 


Z. The next are A and 


$ and eventually zti . Factoring this n by n matrix reveals its determinant: 
2 -l 1 2 -1 
1 3 
—1 2 -1 —5 1 5 -l 
. . -1 . 
— zi +1 
1 2 -7 l EE 


The pivots are on the diagonal of U (the last matrix). When 2 and 3 and 4 and 2 are 
multiplied, the fractions cancel. The determinant of the 4 by 4 matrix is 5. The 3 by 3 
determinant is 4. The n by n determinant isn + 1: 


—1,2,-1 matrix detA = (2) (3) ($) CH) =n 41. 


Important point: The first pivots depend only on the upper left corner of the original 
matrix A. This is a rule for all matrices without row exchanges. 


The first k pivots come from the k by k matrix A, in the top left corner of A. 
The determinant of that corner submatrix Aj is didz- dy. 


The 1 by 1 matrix A; contains the very first pivot dı. This is det Ay. The 2 by 2 matrix in 
the corner has det Az = d;d. Eventually the n by n determinant uses the product of all n 
pivots to give det A, which is det A. 

Elimination deals with the corner matrix A; while starting on the whole matrix. We 
assume no row exchanges—then A = LU and Ay = L,U,. Dividing one determinant 
by the previous determinant (det Ay divided by det Az, 1) cancels everything but the latest 
pivot dg. This gives a ratio of determinants formula for the pivots: 


(3) 


+ 


In the —1, 2, —1 matrices this ratio correctly gives the pivots 2,3, 4,..., 
matrices in Problem 5.1.31 also build from the upper left corner. 
We don’t need row exchanges when all these corner submatrices have det Ag Æ 0. 


2+1, The Hilbert 


The Big Formula for Determinants 


Pivots are good for computing. They concentrate a lot of information—enough to find the 
determinant. But it is hard to connect them to the original a;;. That part will be clearer if 
we go back to rules 1-2-3, linearity and sign reversal and det Z = 1. We want to derive a 
single explicit formula for the determinant, directly from the entries a;;. 

The formula has n! terms. Its size grows fast because n! = 1,2, 6,24,120,.... For 
n = 11 there are about forty million terms. For n = 2, the two terms are ad and bc. Half 
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the terms have minus signs (as in —bc). The other half have plus signs (as in ad). For 
n = 3 there are 3! = (3)(2)(1) terms. Here are those six terms: 

+411422433 + 412423431 + 413421432 ` 


> 711423032 — 412421433 — 413422431. ` 


(4) 


Notice the pattern. Each product like 411423832 has one entry from each row. It also has 
one entry from each column. The column order 1, 3, 2 means that this particular term 
comes with a minus sign. The column order 3, 1, 2 in 413421432 has a plus sign. It will be 
“permutations” that tell us the sign. 

The next step (n = 4) brings 4! = 24 terms. There are 24 ways to choose one entry 
from each row and column. Down the main diagonal, a11822433444 with column order 
1,2,3, 4 always has a plus sign. That is the “identity permutation”. 

To derive the big formula I start with n = 2. The goal is to reach ad —bc ina systematic 
way. Break each row into two simpler rows: 


[a b]=[a 0]+[0 b] and [c d]=[e 0]+[0 a]. 


Now apply linearity, first in row 1 (with row 2 fixed) and then in row 2 (with row 1 fixed): 


a b\_|a 0 0 b 
c d| jc d c d 
0 0 0 b 0 b o 
a a 
=le + lo a|" fe o+ lo a| 


The last line has 2? = 4 determinants. The first and fourth are zero because their rows are 
dependent—one row is a multiple of the other row. We are left with 2! = 2 determinants 
to compute: 

a 0 0 b 1 0 0 1 
0 d c 0 0 1 1 0 


The splitting led to permutation matrices. Their determinants give a plus or minus sign. 
The 1’s are multiplied by numbers that come from A. The permutation tells the column 
sequence, in this case (1, 2) or (2, 1). 

Now try n = 3. Each row splits into 3 simpler rows like [a;; 0 0]. Using linearity in 
each row, det A splits into 33 = 27 simple determinants. If a column choice is repeated— 
for example if we also choose [a2; 0 0]—then the simple determinant is zero. We pay 
attention only when the nonzero terms come from different columns. 


+ be =ad—be. 


|+ 


|= aa 
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There are 3! = 6 ways to order the columns, so six determinants. The six permuta- 
tions of (1, 2,3) include the identity permutation (1,2,3) from P = T: 


Column numbers = (1, 2,3), (2,3, 1), (3, 1,2), (1,3,2), 2, 1,3), (3,2,1). ©) 


The last three are odd permutations (one exchange). The first three are even permutations 
(0 or 2 exchanges). When the column sequence is (a, 6, w), we have chosen the entries 
41¢42g43—and the column sequence comes with a plus or minus sign. The determinant 
of A is now split into six simple terms. Factor out the a;;: 


1 1 1 
det A = 411422833 l + 412423431 1| + 413421432 | 1 
l 1 1 
(7) 
l 1 1 
+ 411423432 1] + 412421433 | 1 + 413422431 1 
1 1 1 

The first three (even) permutations have det P = +1, the last three (odd) permutations 

have det P = —1. We have proved the 3 by 3 formula in a systematic way. 
Now you can see the n by n formula. There are n! orderings of the columns. The 
columns (1,2,. ..,”) go in each possible order (œ, $,. . .,w). Taking @,q from row 1 


and azg from row 2 and eventually a,,, from row n, the determinant contains the product 
Q1¢42g ***Anw times +1 or —1. Half the column orderings have sign —1. 

The complete determinant of A is the sum of these n! simple determinants, times 1 
or —1. The simple determinants ajyd2g ++:dnqw choose one entry from every row and 
column: 


X (det P )A10428 *** Anew 


poets as) 5 


The 2 by 2 case is +a11422 — 412421 (which is ad — bc). Here P is (1,2) or (2, 1). 

The 3 by 3 case has three products “down to the right” (see Problem 28) and three 
products “down to the left”. Warning: Many people believe they should follow this pattern 
in the 4 by 4 case. They only take 8 products—but we need 24. 


Example 3 (Determinant of U) When U is upper triangular, only one of the n! products 
can be nonzero. This one term comes from the diagonal: det U = +41U22°++Unn. All 
other column orderings pick at least one entry below the diagonal, where U has zeros. As 
soon as we pick a number like 21 = 0 from below the diagonal, that term in equation (8) 
is sure to be zero. 

Of course det J = 1. The only nonzero term is +(1)(1)--- (1) from the diagonal. 
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Example 4 Suppose Z is the identity matrix except for column 3. Then 


determinant of Z = =C. (9) 


O Om. © 
aS SR 
= OC OS 


1 
0 
0 
0 


The term (1)(1)(c)(1) comes from the main diagonal with a plus sign. There are 23 other 
products (choosing one factor from each row and column) but they are all zero. Reason: If 
we pick a, b, or d from column 3, that column is used up. Then the only available choice 
from row 3 is zero. 

Here is a different reason for the same answer. If c = 0, then Z has a row of zeros and 
det Z = c = Ois correct. If c is not zero, use elimination. Subtract multiples of row 3 
from the other rows, to knock out a, b, d. That leaves a diagonal matrix and det Z = c. 

This example will soon be used for “Cramer’s Rule”. If we move a, b, c, d into the 
first column of Z, the determinant is det Z = a. (Why?) Changing one column of J leaves 
Z with an easy determinant, coming from its main diagonal only. 


Example 5 Suppose A has 1’s just above and below the main diagonal. Here n = 4: 


and P4 = have determinant 1. 


0 1 0 0 
100 0 
0001 
00 1 0 
The only nonzero choice in the first row is column 2. The only nonzero choice in row 4 is 
column 3. Then rows 2 and 3 must choose columns 1 and 4. In other words P4 is the only 
permutation that picks out nonzeros in A4. The determinant of P4 is +1 (two exchanges to 
reach 2, 1,4,3). Therefore det Ag = +1. 


Determinant by Cofactors 


Formula (8) is a direct definition of the determinant. It gives you everything at once—but 
you have to digest it. Somehow this sum of n! terms must satisfy rules 1-2-3 (then all the 
other properties follow). The easiest is det Z = 1, already checked. The rule of linearity 
becomes clear, if you separate out the factor a; OF 412 OF Aia that comes from the first 
row. For 3 by 3, separate the usual 6 terms of the determinant into 3 pairs: 


(422433423432) +442 (423431 421433) +433 (421432—G22431).,, (10) 
Those three quantities in parentheses are called “cofactors”. They are 2 by 2 determinants, 
coming from matrices in rows 2 and 3. The first row contributes the factors 411,412,413. 
The lower rows contribute the cofactors C11, C12, C13. Certainly the determinant a11 C11 + 
a12C12 + a13C13 depends linearly on a11, 412, 4;3— this is rule 3. 
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The cofactor of a11 is C11 = 422433 — 423432. You can see it in this splitting: 


@i1 @i2 413 aii di2 a13 
421 422 a233| = a22 423!+ |aa 423|+1a21 422 
431 432 433 432 433 a3] a33 431, a32 


We are still choosing one entry from each row and column. Since ai; uses up row 1 and 
column 1, that leaves a 2 by 2 determinant as its cofactor. 

As always, we have to watch signs. The 2 by 2 determinant that goes with a,2 looks 
like 421433 — a23431. But in the cofactor C12, its sign is reversed. Then a12C12 is the 
correct 3 by 3 determinant. The sign pattern for cofactors along the first row is plus-minus- 
plus-minus. You cross out row 1 and column j to get a submatrix Mı; of size n — 1. 
Multiply its determinant by (—1)!*/ to get the cofactor: 


The cofactors along row 1 are Cı; = (—1)'*/ det M1). 
The cofactor expansion is det A = a11C11 + &12C12 + +--+ + ainin. (11) 


In the big formula (8), the terms that multiply a1; combine to give det Mı. The sign 
is (—1)!*!, meaning plus. Equation (11) is another form of equation (8) and also equa- 
tion (10), with factors from row 1 multiplying cofactors that use the other rows. 


Note Whatever is possible for row 1 is possible for row i. The entries a;; in that row also 
have cofactors C;;. Those are determinants of order n — 1, multiplied by (—1)'t/. Since 
aij; accounts for row į and column j, the submatrix Mj; throws out row i and column j. 
The display shows a43 and M43 (with row 4 and column 3 removed). The sign (—1)**? 
multiplies the determinant of M43 to give C43. The sign matrix shows the + pattern: 


° +-+- 


signs (—1) +} = + t + t 


Cy = (1t det Mij 


A determinant of order n is a combination of determinants of order n — 1. A recursive 
person would keep going. Each subdeterminant breaks into determinants of order n — 2. 
We could define all determinants via equation (12). This rule goes from order n to n — 1 
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to n — 2 and eventually to order 1. Define the 1 by 1 determinant |a| to be the number a. 
Then the cofactor method is complete. 

We preferred to construct det A from its properties (linearity, sign reversal, det 7 = 1). 
The big formula (8) and the cofactor formulas (10)—(12) follow from those properties. 
One last formula comes from the rule that det A = det AT. We can expand in cofactors, 
down a column instead of across a row. Down column j the entries are a1; to anj. The 
cofactors are Cı; to C,;. The determinant is the dot product: 


Cofactors down column j: det A = ayjCiy + a2;Caj tess + anjCyj. (13) 
Cofactors are useful when matrices have many zeros—as in the next examples. 


Example 6 The —1, 2, —1 matrix has only two nonzeros in its first row. So only two 
cofactors C11 and C,2 are involved in the determinant. I will highlight C12: 


3 - 4 2 -l -1 -1 
yon 4al=271 2 -69 2 Al. (14) 
-1 2 -1 2 


-1 2 


You see 2 times Cj, first on the right, from crossing out row 1 and column 1. This cofactor 
has exactly the same —1, 2, —1 pattern as the original A—but one size smaller. 

To compute the boldface C12, use cofactors down its first column, The only nonzero 
is at the top. That contributes another —1 (so we are back to minus). Its cofactor is the 
—1,2,—1 determinant which is 2 by 2, two sizes smaller than the original A. 


Summary Each determinant D,, of order n comes from D,~, and Dn-2: 


D4 =2D3—D,  andgenerally : ; (15) 


Direct calculation gives D2 = 3 and D3 = 4. Equation (14) has Dy = 2(4) —3 = 5. 
These determinants 3, 4, 5 fit the formula D, = n + 1. That “special tridiagonal answer” 
also came from the product of pivots in Example 2. 

The idea behind cofactors is to reduce the order one step at a time. The determinants 
D, =n + 1 obey the recursion formula n + 1 = 2n — (n — 1). As they must. 


Example 7 This is the same matrix, except the first entry (upper left) is now 1: 


1 -1 
-1 2-1 
By = -1 2-1 


-1 2 


All pivots of this matrix turn out to be 1. So its determinant is 1. How does that come 
from cofactors? Expanding on row 1, the cofactors all agree with Example 6. Just change 
a4, = 2tob = 1: 


det Bg = D3 — Do instead of det A4 = 2D3 — Do. 


The determinant of B4 is 4—3 = 1. The determinant of every Bn isn —(n-—1) = 1. 
Problem 13 asks you to use cofactors of the last row. You still find det B, = 1. 
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= REVIEW OF THE KEY IDEAS = 


1. With no row exchanges, det A = (product of pivots). In the upper left corner, det Ag 
= (product of the first k pivots). 


2. Every term in the big formula (8) uses each row and column once. Half of the n! 
terms have plus signs (when det P = +1) and half have minus signs. 


3. The cofactor Cj; is (—1)'*/ times the smaller determinant that omits row i and 
column j (because a;; uses that row and column). 


4. The determinant is the dot product of any row of A with its row of cofactors. When 
a row of A has a lot of zeros, we only need a few cofactors. 


= WORKED EXAMPLES = 


5.2A A Hessenberg matrix is a triangular matrix with one extra diagonal. Use cofactors 
of row 1 to show that the 4 by 4 determinant satisfies Fibonacci’s rule |H4| = |H3|+|H2\. 
The same rule will continue for all sizes, |H,| = |Hn-1| + |Hn—2|. Which Fibonacci 
number is | H,,|? 


H4 = 


pat pmi et ND 
= = Ne 
pmi D jni 
N= 


Solution The cofactor Ci; for H4 is the determinant |H3|. We also need C12 (in bold- 
face): 


Rows 2 and 3 stayed the same and we used linearity in row 1. The two determinants on the 
right are —|H3| and +|H2|. Then the 4 by 4 determinant is 


|H4| = 2C11 + 1Ci2 = 2|H3| —|H3| + | Ho] = |B3| + | AI. 


The actual numbers are |H2| = 3 and |H3| = 5 (and of course |H;| = 2). Since |H,,| 
follows Fibonacci’s rule |H,—,| + |H,—2]|, it must be |H,| = Fy+2. 


5.2B These questions use the + signs (even and odd P’s) in the big formula for det A: 


1. If A is the 10 by 10 all-ones matrix, how does the big formula give det A = 0? 
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2. If you multiply all n! permutations together into a single P, is P odd or even? 


3. If you multiply each a;; by the fraction 1/7, why is det A unchanged? 


Solution In Question 1, with all a;; = 1, all the products in the big formula (8) will 
be 1. Half of them come with a plus sign, and half with minus. So they cancel to leave 
det A = 0. (Of course the all-ones matrix is singular.) 

In Question 2, multiplying [ {9 ][4 9] gives an odd permutation. Also for 3 by 3, the 
three odd permutations multiply (in any order) to give odd. But for n > 3 the product of 
all permutations will be even. There are n!/2 odd permutations and that is an even number 
as soon as it includes the factor 4. 

In Question 3, each a;; is multiplied by i/j. So each product @1¢@2g '** ano in the 
big formula is multiplied by all the row numbers i = 1,2,..., and divided by all the 
column numbers j = 1,2,...,n. (The columns come in some permuted order!) Then 
each product is unchanged and det A stays the same. 

Another approach to Question 3: We are multiplying the matrix A by the diagonal 
matrix D = diag(1 : n) when row i is multiplied by 7. And we are postmultiplying by 
D~! when column j is divided by j. The determinant of DAD™! is the same as det A 
by the product rule. 


Problem Set 5.2 


Problems 1-10 use the big formula with n! terms: |A| = >> £41,428 °** ano. 


1 Compute the determinants of A, B, C from six terms. Are their rows independent? 
12 3 1 2 3 1 11 
A=|3 1 2 B=|4 4 4 c=/]1 1 0 
3 2 1 5 6 7 1 0 0 


2 Compute the determinants of A, B, C, D. Are their columns independent? 
1 1 0 1 2 3 
A=|1 0 11| B=|4 5 6 c=| A D=|6 3 | 
0 1 1 7 8 9 


3 Show that det A = 0, regardless of the five nonzeros marked by x’s: 


x xX xX What are the cofactors of row 1? 
A=/0 0 x]. What is the rank of A? 
0 0x What are the 6 terms in det A? 
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4 Find two ways to choose nonzeros from four different rows and columns: 


100 1 100 2 
O 1 1 1 0345 

A= 1101 B= 5 4 0 3 (B has the same zeros as A). 
100 1 2 0 0 1 


Is det A equal to 1 + 1 or 1 — 1 or —1 — 1? What is det B? 


5 Place the smallest number of zeros in a 4 by 4 matrix that will guarantee det A = 0. 
Place as many zeros as possible while still allowing det A # 0. 


6 (a) If a}; = @22 = a33 = 0, how many of the six terms in det A will be zero? 


(b) If a4, = a22 = 433 = G44 = 0, how many of the 24 products a1;a2403;a4m 
are sure to be zero? 


7 How many 5 by 5 permutation matrices have det P = +1? Those are even permuta- 
tions. Find one that needs four exchanges to reach the identity matrix. 


8 If det A is not zero, at least one of the n! terms in formula (8) is not zero. Deduce 
from the big formula that some ordering of the rows of A leaves no zeros on the 
diagonal. (Don’t use P from elimination; that PA can have zeros on the diagonal.) 


9 Show that 4 is the largest determinant for a 3 by 3 matrix of 1’s and —1’s. 


10 How many permutations of (1,2,3,4) are even and what are they? Extra credit: 
What are all the possible 4 by 4 determinants of J + Peven? 


Problems 11-22 use cofactors C;; = (—1)'+/ det M;;. Remove row i and column j. 


11 Find all cofactors and put them into cofactor matrices C, D. Find AC and det B. 


a b l 2 3 
=f A B=j;,4 5 6 
7 0 0 


12 Find the cofactor matrix C and multiply A times CT. Compare ACT with A7!: 


2-1 0 ,/3 2 1 
A=|-1 2 -1 Al =-|2 4 2 
0-1 2 4112 3 


13 Then by n determinant Cn has 1’s above and below the main diagonal: 


T 0 1 0 1010 
Ci = |0] =|} o G= o1 Co=|y i oal 
010 0010 
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14 


15 


16 


17 


(a) What are these determinants C1, Co, C3, C4? 


(b) By cofactors find the relation between C,, and Cy; and C,~2. Find Cio. 
The matrices in Problem 13 have 1’s just above and below the main diagonal. Going 
down the matrix, which order of columns (if any) gives all 1’s? Explain why that 


permutation is even for n = 4,8,12,... and odd for n = 2,6, 10,.... Then 


Cy = 0 (odd n) Cn =1(n =4,8,-:-) Cn = —1 (n =2,6,---). 


The tridiagonal 1, 1, 1 matrix of order n has determinant £,,: 


11 Pro Dd 0 
E =|| E= E3=ļ|1i 1 1 E, = . 
1 1 011 O 1 1 1 
001 1 
(a) By cofactors show that En = En-1 — En-2. 
(b) Starting from E = 1 and FE, = 0 find E3, F4,..., Eg. 
(c) By noticing how these numbers eventually repeat, find E100. 
Fn is the determinant of the 1, 1, —1 tridiagonal matrix of order n: 
1 -=l P-T © i ` —1 
F = =2 F3 = 1 1 -l/=3 Fa = # 4, 
1 1 1 1 —i 
0 1 iol 


Expand in cofactors to show that Fa = Fa-1 + Fr—2. These determinants are 
Fibonacci numbers 1,2,3,5,8,13,.... The sequence usually starts 1, 1,2, 3 (with 
two 1’s) so our Fa is the.usual F),41. 


The matrix B, is the —1,2,—1 matrix A, except that bıı = 1 instead of aj, = 2. 
Using cofactors of the last row of B4 show that |B4| = 2|B3| — |B2| = 1. 


1 -i 
1 -l 
-1 2-1 1 -i 
By = a 2-1| =]! s = Ba =|} >|. 
-1 2 
The recursion |B,| = 2|Bn—1| — |Bn—2| is satisfied when every |B,| = 1. This 


recursion is the same as for the A’s in Example 6. The difference is in the starting 
values 1, 1, 1 for the determinants of sizes n = 1,2, 3. 
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18 Go back to B, in Problem 17. It is the same as A, except for bj; = 1. So use 
linearity in the first row, where [1 —i 0] equals[2 —1 0] minus[1 0 0]: 


—1 0 2 =l 0 1 0 0 
—İ -Í -1 
B = = — 
|Bn| An-1 An-\ An-1 
0 0 0 


Linearity gives |Ba] = |A,|—|An-1| = 


19 Explain why the 4 by 4 Vandermonde determinant contains x? but not x4 or x5: 


l aa* æ 
1 b bP b 
V4 = det 
4 l e e o 
1 x x* x3 
The determinant is zero at x = ; , and . The cofactor of x3 is 


V3 = (b—a)(c —a)(c — b}. Then V4 = (b-—a)(c —a)(c —b)(x —a) (x —b) (x —-C). 


20 Find G» and G3 and then by row operations G4. Can you predict Ga? 


I 
1 
l 
0 


tet et 


21 Compute $1, $2, S3 for these 1,3, 1 matrices. By Fibonacci guess and check $4. 


22 Change 3 to 2 in the upper left corner of the matrices in Problem 21. Why does 
that subtract S,—; from the determinant S,,? Show that the determinants of the new 
matrices become the Fibonacci numbers 2, 5, 13 (always Fon+1). 


Problems 23-26 are about block matrices and block determinants. 
23 With 2 by 2 blocks in 4 by 4 matrices, you cannot always use block determinants: 


A B 
0 D 


A B 
=lalp] but |É p| # 1alD1-ICIIB 


(a) Why is the first statement true? Somehow B doesn’t enter. 
(b) Show by example that equality fails (as shown) when C enters. 
(c) Show by example that the answer det(AD — CB) is also wrong. 
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24 With block multiplication, A = LU has A, = L,;U;, in the top left corner: 


| A k * | l L k 0 | | U, k * | 

A= = . 

* ë * ko QO «x 

(a) Suppose the first three pivots of A are 2,3,—1. What are the determinants of 


Li, L2, L3 (with diagonal 1’s) and U;, U2, U3 and A;, A2, A3? 
(b) If Ay, Az, A3 have determinants 5, 6, 7 find the three pivots from equation (3). 


25 Block elimination subtracts CA~! times the first row [A 8] from the second row 
[C_D]. This leaves the Schur complement D — CA™ B in the corner: 


I O|fA B]_f[A B 
—CA Ij|c D|~|0 D-CA BI: 


Take determinants of these block matrices to prove correct rules if A~! exists: 
| A B 


c D|= |A| |D — CA7!B| = |AD-CB| provided AC = CA. 


26 IfAism byn and B isn by m, block multiplication gives det M = det AB: 


v-e J-e AL 


If A is a single row and B is a single column what is det M? If A is a column and B 
is a row what is det M? Do a 3 by 3 example of each. 


27 (A calculus question) Show that the derivative of det A with respect to a, is the 
cofactor C11. The other entries are fixed—we are only changing aj). 


Problems 28-33 are about the “big formula” with n! terms. 
28 <A 3 by 3 determinant has three products “down to the right” and three “down to the 
left” with minus signs. Compute the six terms like (1)(5)(9) = 45 to find D. 


Explain without determinants 
why this particular matrix 
is or is not invertible. 


-- +++ 


29 For E4 in Problem 15, five of the 4! = 24 terms in the big formula (8) are nonzero. 
Find those five terms to show that E4 = —1. 


30 For the 4 by 4 tridiagonal second difference matrix (entries ~1, 2, —1) find the five 
terms in the big formula that give det A = 16-4-—4-—4+1. 
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31 Find the determinant of this cyclic P by cofactors of row 1 and then the “big for- 
mula”. How many exchanges reorder 4, 1, 2,3 into 1,2, 3, 4? Is |P?| = 1 or—1? 


0 0 0 1 0010 
|1000 2 _ |0001] fol 
P=lo 100 =| ooo t-te of 

00 1 0 0 1 0 0 

Challenge Problems 


32 Cofactors of the 1,3, 1 matrices in Problem 21 give a recursion Sn = 3S,—1 — Sp-2.- 
Amazingly that recursion produces every second Fibonacci number. Here is the chal- 
lenge. 


Show that S, is the Fibonacci number Fzn+2 by proving Fon+2 = 3Fən — Fon-2. 
Keep using Fibonacci’s rule Fy = Fy_, + Fg—2 starting with k = 2n + 2. 


33 The symmetric Pascal matrices have determinant 1. If I subtract 1 from the n,n 
entry, why does the determinant become zero? (Use rule 3 or cofactors.) 


1 1 1 1 11 1 1 

12 3 4 12 3 4 . 
det 13 6 lol= 1 (known) det 13 6 10|” 0 (to explain). 

1 4 10 20 1 4 10 19 


34 This problem shows in two ways that det A = 0 (the x’s are any numbers): 


D 

il 
oooxx 
ooox 
oc Oe & 
eo ON Me 
eM OM E a E 


(a) How do you know that the rows are linearly dependent? 
(b) Explain why all 120 terms are zero in the big formula for det A. 
35 If|det(A)| > 1, prove that the powers A” cannot stay bounded. But if jdet(A)| < 1, 


show that some entries of A” might still grow large. Eigenvalues will give the right 
test for stability, determinants tell us only one number. 
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5.3 Cramer’s Rule, Inverses, and Volumes 


This section solves Ax = b—by algebra and not by elimination. We also invert A. In the 
entries of A`}, you will see det A in every denominator—we divide by it. (If det A = 0 
then we can’t divide and A! doesn’t exist.) Each entry in AT! and A~'!B is a determinant 
divided by the determinant of A. 


Cramer’s Rule solves Ax = b. A neat idea gives the first component x,. Replacing the 
first column of J by x gives a matrix with determinant xı. When you multiply it by A, the 
first column becomes Ax which is b. The other columns are copied from A: 


x, 0 0 bi a a13 
Key idea A X2 1 0ļ= ba d22 d23 | = Bi . (1) 
x3 O | b3 432 33 


We multiplied a column at a time. Take determinants of the three matrices: 
_ det B 1 


Product rule (det A)(x,) = det Bı or Xi = Jeta (2) 
This is the first component of x in Cramer’s Rule! Changing a column of A gives B1. 
To find x2, put the vector x into the second column of the identity matrix: 
1 x 1 0 
Same idea @, a2 a3 0 X2 Oļ=]|a b d3 | = B2. (3) 
0 X3 1 


Take determinants to find (det A)(x2) = det B2. This gives x2 in Cramer’s Rule: 


Example 1 Solving 3x; + 4x2 = 2 and 5x; + 6x2 = 4 needs three determinants: 


3 4 4 
5 6 4 6 


Those determinants are —2 and —4 and 2. All ratios divide by det A: 


det A = | | det By = 


, 2—4 _ 2 3 4 2 _ 2 
Cramer’s Rule xı = — = 2 2 = =O lk check | § s||a}-[4] 


To solve ann by n system, Cramer’s Rule evaluates n + 1 determinants (of A and the 
n different B’s). When each one is the sum of n! terms—applying the “big formula” with 
all permutations—this makes a total of (n + 1)! terms. Jt would be crazy to solve equations 
that way. But we do finally have an explicit formula for the solution x. 
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Example 2 Cramer’s Rule is inefficient for numbers but it is well suited to letters. For 
n = 2, find the columns of A`! by solving AAT! = I: 


a bj[u]_fl abil} yi {|_| 0 
Columns of J f alle l=lo. | aal- 
Those share the same A. We need five determinants for x1, x2, Y1, ya: 
a b d 1 b al 0 b 
and jo d c o| |1 d 


c d 
The last four are d, —c, —b, and a. (They are the cofactors!) Here is A7!: 


d =c —b a l d —b 
= — = — = — = — dth AT! = . 
A) 2T fay’ T ap T a, eee ail a 


I chose 2 by 2 so that the main points could come through clearly. The new idea is the 
appearance of the cofactors. When the right side is a column of the identity matrix /, the 
determinant of each matrix B; in Cramer’s Rule is a cofactor. 

You can see those cofactors for n = 3. Solve AAT! = J (first column only): 


. 1 ain a aun l a au ai 1 
Determinants 0 a i. a o ans au az o 6) 
= Cofactors of A 22 %23 21 23 21 422 

0 a32 a33 a3, 0 a33 a3, az 0 


That first determinant | B,| is the cofactor C11. The second determinant | B>| is the cofactor 
C2. Notice that the correct minus sign appears in —(a21833 — 423431). This cofactor C12 
goes into the 2, 1 entry of A~!—the first column! So we transpose the cofactor matrix, and 
as always we divide by det A. 


“The i, j entry of A~ is the cofactor Cj; (not Cij) divided by det A: > 


Cji cT o 
“(AT! ji -1 : ; 
oS : ij d A — . y Je 
a “ . u dea E detA oe 


The cofactors C;; go into the “cofactor matrix” C. Its transpose leads to A~'. To compute 
the i, j entry of A~!, cross out row j and column i of A. Multiply the determinant by 
(—1)'*/ to get the cofactor, and divide by det A. 

Check this rule for the 3, 1 entry of A~!. This is in column | so we solve Ax = (1, 0, 0). 
The third component x3 needs the third determinant in equation (5), divided by det A. That 
third determinant is exactly the cofactor C13 = a21432—422431. So (A~!)31 = C13/ det A 
(2 by 2 determinant divided by 3 by 3). 

Summary In solving AAT! = J, the columns of J lead to the columns of A~!. Then 
Cramer’s Rule using b = columns of I gives the short formula (6) for A~!. 
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Direct proof of the formula A~! = C™/ det A The idea is to multiply A times CT: 


(7) 


Row 1 of A times column 1 of the cofactors yields the first det A on the right: 
4141Ci3 +8&12C12 + 413C13 = det A by the cofactor rule. 


Similarly row 2 of A times column 2 of C' (transpose) yields det A. The entries az; are 
multiplying cofactors C2; as they should, to give the determinant. 


How to explain the zeros off the main diagonal in equation (7)? Rows of A are multi- 
plying cofactors from different rows. Why is the answer zero? 


Row 2 of A 


Row 1 of C a21C11 + 422C12 + a23C13 = 0. (8) 


Answer: This is the cofactor rule for a new matrix, when the second row of A is copied into 
its first row. The new matrix A* has two equal rows, so det A* = 0 in equation (8). Notice 
that A* has the same cofactors C11, C12, C13 as A—because all rows agree after the first 
row. Thus the remarkable multiplication (7) is correct: 


ct 


AC? = (det AM A} = —_., 
(det A) or det A 


Example 3 The “sum matrix” A has determinant 1. Then A~! contains cofactors: 


100 0 1 0 O0 9 
|1100 . a C_|- 1 0 0 
A= 111.0 has inverse AU = T=lo-1 1 0 
1 1 1 1 0 0 -1 1 


Cross out row 1 and column | of A to see the 3 by 3 cofactor Cy; = 1. Now cross out row 
1 and column 2 for C12. The 3 by 3 submatrix is still triangular with determinant 1. But 
the cofactor Cız is —1 because of the sign (—1)!*?. This number —1 goes into the (2, 1) 
entry of A~!—don’t forget to transpose C. 

The inverse of a triangular matrix is triangular. Cofactors give a reason why. 


Example 4 If all cofactors are nonzero, is A sure to be invertible? No way. 


Area of a Triangle 


Everybody knows the area of a rectangle—base times height. The area of a triangle is half 
the base times the height. But here is a question that those formulas don’t answer. If we 
know the corners (x1, y1) and (x2, y2) and (x3, y3) of a triangle, what is the area? 
Using the corners to find the base and height is not a good way. 
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(x2, y2) (x2, y2) 


(x1, y1) 
(x3, y3) 


Figure 5.1: General triangle; special triangle from (0, 0); general from three specials. 


Determinants are much better. The square roots in the base and height cancel out in the 
good formula. The area of a triangle is half of a 3 by 3 determinant. If one corner is at 
the origin, say (x3, y3) = (0,0), the determinant is only 2 by 2. 


ne Area of triangle 1 x2 y2 1|: Area = ł ri M when (x3, y3) = (0, 0). RAS 
gi : 2 2 et 


When you set x3 = y3 = 0 in the 3 by 3 determinant, you get the 2 by 2 determinant. These 
formulas have no square roots—they are reasonable to memorize. The 3 by 3 determinant 
breaks into a sum of three 2 by 2’s, just as the third triangle in Figure 5.1 breaks into three 
special triangles from (0, 0): 


1 +4(x1y2 — x271) 
Area = }|x2 yo 1|= +4(x2y3 — x32) (9) 
J +4(x3y1 — x1 y3). 


Cofactors of 
column 3 


If (0, 0) is outside the triangle, two of the special areas can be negative—but the sum is still 
correct. The real problem is to explain the special area L(x Y2 — X2y1). 


Why is this the area of a triangle? We can remove the factor ł and change to a paral- 
lelogram (twice as big, because the parallelogram contains two equal triangles). We now 
prove that the parallelogram area is the determinant x) Y2 — x2y;. This area in Figure 5.2 


is 11, and therefore the triangle has area H, 


Proof that a parallelogram starting from (0,0) has area = 2 by 2 determinant. 
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(x2, y2) (1,3) Parallelogram 
4 I 
Area = f 3l = 1] 
(0,0) (x1, y1) (0,0) (4,1) Triangle: Area = H4 


Figure 5.2: A triangle is half of a parallelogram. Area is half of a determinant. 


There are many proofs but this one fits with the book. We show that the area has the same 
properties 1-2-3 as the determinant. Then area = determinant! Remember that those three 
rules defined the determinant and led to all its other properties. 


1 When A = J, the parallelogram becomes the unit square. Its area is det Z = 1. 


2 When rows are exchanged, the determinant reverses sign. Fhe absolute value (positive 
area) stays the same—it is the same parallelogram. 


3 If row 1 is multiplied by ¢, Figure 5.3a shows that the area is also multiplied by ¢. Sup- 
pose a new row (xj, y1) is added to (x1, y1) (keeping row 2 fixed). Figure 5.3b shows 
that the solid parallelogram areas add to the dotted parallelogram area (because the two 
triangles completed by dotted lines are the same). 


Dotted area = Solid area = A + A 


Full area = tA _--7 | 


- l 


j (Atx ty) 


I 
6 


(0, 0) (0, 0) 
Figure 5.3: Areas obey the rule of linearity (keeping the side (x2, y2) constant). 


That is an exotic proof, when we could use plane geometry. But the proof has a major 
attraction—it applies in n dimensions. The n edges going out from the origin are given by 
the rows of an n by n matrix. The box is completed by more edges, just like the parallelo- 
gram. 
Figure 5.4 shows a three-dimensional box—whose edges are not at right angles. The 
volume equals the absolute value of det A. Our proof checks again that rules 1-3 for 
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determinants are also obeyed by volumes. When an edge is stretched by a factor f, the 
volume is multiplied by ¢. When edge 1 is added to edge 1’, the new box has edge 1 + 
1’. Its volume is the sum of the two original volumes. This is Figure 5.3b lifted into 
three dimensions or n dimensions. I would draw the boxes but this paper is only two- 
dimensional. 


(a31 , 432, a33) 


volume of box 
=|determinant| 


(411, 412, 413) (a21, 422, 423) 


Figure 5.4: Three-dimensional box formed from the three rows of A. 


The unit cube has volume = 1, which is det 7. Row exchanges or edge exchanges leave 
the same box and the same absolute volume. The determinant changes sign, to indicate 
whether the edges are a right-handed triple (det A > 0) or a left-handed triple (det A < 0). 
The box volume follows the rules for determinants, so volume of the box = absolute value 
of the determinant. 


Example 5 Suppose a rectangular box (90° angles) has side lengths r,s, and t. Its 
volume is r times s times ¢. The diagonal matrix with entries r,s, and £ produces those 
three sides. Then det A also equals r st. 


Example 6 In calculus, the box is infinitesimally small! To integrate over a circle, we 
might change x and y tor and @. Those are polar coordinates: x = rcos@ and y = rsin@. 
The area of a “polar box” is a determinant J times dr d0: 


cos —rsin@ 


J= sinô rcos@ 


Ox/dr ðx/30| _ 
dy/dr aday/de| — 


This determinant is the r in the small area dA = r dr d0. The stretching factor J goes 
into double integrals just as dx/du goes into an ordinary integral f dx = f(dx/du) du. 
For triple integrals the Jacobian matrix J with nine derivatives will be 3 by 3. 
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The Cross Product 


The cross product is an extra (and optional) application, special for three dimensions. Start 
with vectors u = (u1, u2,u3) and v = (v1, v2, v3). Unlike the dot product, which is a 
number, the cross product is a vector—also in three dimensions. It is written u x v and 
pronounced “u cross wv.” The components of this cross product are just 2 by 2 cofactors. 
We will explain the properties that make u x v useful in geometry and physics. 

This time we bite the bullet, and write down the formula before the properties. 


‘DEFINITION The cross product of u = (ur, U2, u3) and v = (v1, Ùz; v3) isa vector 


UXU = |u uz U3) = (ugv3—ugv2)i +(uzvr=urvs) j + (Uive—ua01)k. (10) 


Comment The 3 by 3 determinant is the easiest way to remember u x v. It is not especially 
legal, because the first row contains vectors i, j,k and the other rows contain numbers. 
In the determinant, the vector i = (1,0,0) multiplies u2v3 and —u3v2. The result is 
(u2v3 — U3v2,0,0), which displays the first component of the cross product. 

Notice the cyclic pattern of the subscripts: 2 and 3 give component 1 of u x v, then 3 
and 1 give component 2, then 1 and 2 give component 3. This completes the definition of 
u x v. Now we list the properties of the cross product: 


Property 1 v x u reverses rows 2 and 3 in the determinant so it equals —(u x v). 


Property 2 The cross product u x v is perpendicular to u (and also to v). The direct proof 
is to watch terms cancel. Perpendicularity is a zero dot product: 


w+ (Uu XV) = U (U2QU3 — U3V2) + U2(U3v1 — Uyv3) + U3(U, v2 —U2U,) = Q. (11) 


The determinant now has rows u, u and v so it is zero. 


Property 3 The cross product of any vector with itself (two equal rows) is u x u = 0. 
When u and v are parallel, the cross product is zero. When u and v are perpendicular, the 
dot product is is zero. One involves sin @ and the other involves cos 8: 

| aj. and uvol =u] folf|cos él. (2 


Example 7 7 Since “= - (3, 2, 0): ando v= (1, 4, 0): are in the xy plane, u x v goes up the 
Z axis: 


i j k 
uxv=|3 2 0j|=10k. The cross product isu x v = (0,0, 10). 
1 4 0 
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The length of u x v equals the area of the parallelogram with sides u and v. This will be 
important: In this example the area is 10. 


Example 8 The cross product of u = (1,1, 1) and v = (1, 1, 2) is (1, —1, 0): 


i k 


J 
1 1l}=i 
1 


ajoj aj; 
13 l a-i +i =i- 


This vector (1, —1, 0) is perpendicular to (1, 1, 1) and (1, 1,2) as predicted. Area = J/2. 


Example 9 The cross product of (1,0, 0) and (0, 1,0) obeys the right hand rule. It goes 
up not down: 


ixj=k 


k Rule u x v points along 
0j =k your right thumb when the 
0 fingers curl from u to v. 


Thus i x j = k. The right hand rule also gives j x k =i and k xi = j. Note the cyclic 
order. In the opposite order (anti-cyclic) the thumb is reversed and the cross product goes 
the other way: k x j = —i andi xk = —j and j xi = —k. You see the three plus signs 
and three minus signs from a 3 by 3 determinant. 

The definition of u x v can be based on vectors instead of their components: 


The eross product is a vector with length ||z|| loll sin ĝ|. 


rora 


This definition appeals to physicists, who hate to choose axes and coordinates. They see 
(u1, u2,u3) as the position of a mass and (Fy, Fy, Fz) as a force acting on it. If F is 
parallel to u, then u x F: = 0—there is no turning. The cross product u x F is the turning 
force or torque. It points along the turning axis (perpendicular to u and F). Its length 
leli || ¥ || sin @ measures the “moment” that produces turning. 


Triple Product = Determinant = Volume 


Since u x v is a vector, we can take its dot product with a third vector w. That produces 
the triple product (u x v) - w. It is called a “scalar” triple product, because it is a number. 
In fact it is a determinant—it gives the volume of the u, v, w box: 


Wi, W2 W3 uy ug U3 


Triple product Uy Uz Uz|= |v, ve V3]. (13) 
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We can put w in the top or bottom row. The two determinants are the same because 
row exchanges go from one to the other. Notice when this determinant is zero: 


(uxv)-w =0 exactly when the vectors u, v, w lie in the same plane. 
First reason u x v is perpendicular to that plane so its dot product with w is zero. 
Second reason Three vectors in a plane are dependent. The matrix is singular (det = 0). 


Third reason Zero volume when the u, v, w box is squashed onto a plane. 


It is remarkable that (u x v) - w equals the volume of the box with sides u, v, w. This 
3 by 3 determinant carries tremendous information. Like ad — bc for a 2 by 2 matrix, it 
separates invertible from singular. Chapter 6 will be looking for singular. 


= REVIEW OF THE KEY IDEAS #8 


1. Cramer’s Rule solves Ax = b by ratios like x; = |B,|/|A| = |b a2---ay|/|Al. 
. When C is the cofactor matrix for A, the inverse is A~! = CT/ det A. 
. The volume of a box is | det A|, when the box edges are the rows of A. 


. Area and volume are needed to change variables in double and triple integrals. 


an A ù N 


. In RŽ, the cross product u x v is perpendicular to u and v. 


= WORKED EXAMPLES œ 


5.3A IfA is singular, the equation ACT = (det A)J becomes ACT = zero matrix. 
Then each column of C7 is in the nullspace of A. Those columns contain cofactors along 
rows of A. So the cofactors quickly find the nullspace of a 3 by 3 matrix—my apologies 
that this comes so late! 

Solve Ax = 0 by x = cofactors along a row, for these singular matrices of rank 2: 


Cofactors 14 7 1 2 

give A=] 2 3 9 A=; 1 1 1 

Nullspace 2 2 8 1 1 1 
Any nonzero column of CT will give the desired solution to Ax = 0. With rank 2, 


A has at least one nonzero cofactor. If A has rank 1 we get x = 0 and the idea fails. 
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Solution The first matrix has these cofactors along its top row (note each minus sign): 
3 9 2 9 2 3 
ejs  -[ae[=?2 [2al 


Then x = (6,2, —2) solves Ax = 0. The cofactors along the second row are (—18, —6, 6) 
which is just —3x. This is also in the one-dimensional nullspace of A. 
The second matrix has zero cofactors along its first row. The nullvector x = (0, 0,0) 
is not interesting. The cofactors of row 2 give x = (1,—1,0) which solves Ax = 0. 
Every n by n matrix of rank n — 1 has at least one nonzero cofactor by Problem 3.3.12. 
But for rank n — 2, all cofactors are zero and we only find x = 0. 


5.3 B Use Cramer’s Rule with ratios det B;/ det A to solve Ax = b. Also find the 
inverse matrix A~! = C™/ det A. Why is the solution x for this b the same as column 3 of 
A~!? Which cofactors are involved in computing that column x? 


2 6 2 x 0 
Ax=b is 1 4 2 y |=] 0 
5 9 0 A l 


Find the volumes of the boxes whose edges are columns of A and then rows of A7!. 


Solution The determinants of the B; (with right side b placed in column j ) are 


|Bı| = =4 |B2|/= 


m= — S 


6 
4 
9 


>O NN 
Ue N 
m © c 


2 2 6 
2|=-2 |Bl]=|1 4 
0 5 9 


Those are cofactors C31, C32, C33 of row 3. Their dot product with row 3 is det A: 
det A = a31C31 + 432C32 + 433C33 = (5, 9,0) - (4, —2,2) = 2. 
The three ratios det B j / det A give the three components of x = (2,—1,1). This x is the 


third column of A~! because b = (0,0, 1) is the third column of J. The cofactors along 
the other rows of A, divided by det A = 2, give the other columns of A7!: 


cT 1[ “18 18 4 
Ti = qd 2 10 —10 —2 |. Multiplytocheck AAT! =] 
et -ll1 12 2 


The box from the columns of A has volume = det A = 2 (the same as the box from the 
rows, since |AT| = |A|). The box from AT! has volume 1/|A| = L, 
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Problem Set 5.3 


Problems 1-5 are about Cramer’s Rule for x = A~!b. 


1 


Solve these linear equations by Cramer’s Rule x; = det B; / det A: 


2X, + x2 = |] 

2 5x2 = 1 
(a) ar, (b) xı +2x2 + x3 =0 
l 2 x2 + 2x3 = 0. 


Use Cramer’s Rule to solve for y (only). Call the 3 by 3 determinant D: 


ax+tby+cz=1 
(b) dx+ey+ fz=90 
gxthy+ iz=0. 


ax+by=1 
(a) cx+dy =0 
Cramer’s Rule breaks down when det A = 0. Example (a) has no solution while 
(b) has infinitely many. What are the ratios x; = det B; / det A in these two cases? 


2X1 + 3x2 = 1 
4x, + 6x2 = | 


2X1 + 3x2 = 1 


(a) Ax, + 6X =2 


(parallel lines) (b) (same line) 


Quick proof of Cramer’s rule. The determinant is a linear function of column 1. It is 
zero if two columns are equal. When b = Ax = xa; + X2a2 + x343 goes into the 
first column of A, the determinant of this matrix Bı is 


|b a a3|= |xiai + x242 + x343 a2 a3|= xila; az a3| = x, det A. 


(a) What formula for xı comes from left side = right side? 
(b) What steps lead to the middle equation? 


If the right side 5 is the first column of A, solve the 3 by 3 system Ax = b. How 
does each determinant in Cramer’s Rule lead to this solution x? 


Problems 6-15 are about A~! = C™/ det A. Remember to transpose C. 


6 


Find AT! from the cofactor formula C T/ det A. Use symmetry in part (b). 


120) 2-1 0 
(az) A=|0 3 0 © A=|-1 2 -1 
071 0-1 2 


If all the cofactors are zero, how do you know that A has no inverse? If none of the 
cofactors are zero, is A sure to be invertible? 


Find the cofactors of A and multiply ACT to find det A: 


11 4 6 —3 0 
A=|1 2 2 and C=]. . . and ACT = 
1 2 5 . . . 


If you change that 4 to 100, why is det A unchanged? 
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9 Suppose det A = 1 and you know all the cofactors in C. How can you find A? 
10 From the formula ACT = (det A)J show that det C = (det A)"—!. 


11 = If all entries of A are integers, and det A = 1 or —1, prove that all entries of AT! are 
integers. Give a 2 by 2 example with no zero entries. 


12 If all entries of A and A~! are integers, prove that det A = 1 or —1. Hint: What is 
det A times det A~!? 


13 Complete the calculation of A~! by cofactors that was started in Example 5. 


14 Lis lower triangular and S is symmetric. Assume they are invertible: 


To invert a 0 0 a b d 
triangular L L=|jb c 0 S=|b c e 
symmetric S$ d e f d e f 


(a) Which three cofactors of L are zero? Then L~! is also lower triangular. 


(b) Which three pairs of cofactors of S are equal? Then S~! is also symmetric. 


(c) The cofactor matrix C of an orthogonal Q will be . Why? 
15 Forn = 5 the matrix C contains cofactors. Each 4 by 4 cofactor contains 
terms and each term needs multiplications. Compare with 5? = 125 


for the Gauss-Jordan computation of A~! in Section 2.4. 
Problems 16-26 are about area and volume by determinants. 


16 (a) Find the area of the parallelogram with edges v = (3,2) and w = (1, 4). 
(b) Find the area of the triangle with sides v, w, and v + w. Draw it. 
(c) Find the area of the triangle with sides v, w, and w — v. Draw it. 
17 A box has edges from (0, 0, 0) to (3, 1, 1) and (1, 3, 1) and (1, 1,3). Find its volume. 
Also find the area of each parallelogram face using ||z x v|]. 
18 (a) The corners of a triangle are (2, 1) and (3, 4) and (0, 5). What is the area? 
(b) Add a corner at (—1, 0) to make a lopsided region (four sides). Find the area. 
19 The parallelogram with sides (2, 1) and (2, 3) has the same area as the parallelogram 


with sides (2, 2) and (1, 3). Find those areas from 2 by 2 determinants and say why 
they must be equal. (I can’t see why from a picture. Please write to me if you do.) 


20 The Hadamard matrix H has orthogonal rows. The box is a hypercube! 


1 1 1 1 

1 1-1 -1 a 

Whatis |H|= I-l -1 1/7 volume of a hypercube in R*? 
—1 


I -i 1 
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21 Ifthe columns of a 4 by 4 matrix have lengths L1, L2, L3, L4, what is the largest 
possible value for the determinant (based on volume)? If all entries of the matrix are 
1 or —1, what are those lengths and the maximum determinant? 


22 Show by a picture how a rectangle with area xı y2 minus a rectangle with area x2) 
produces the same area as our parallelogram. 


23 When the edge vectors a, b,c are perpendicular, the volume of the box is ||a|| times 
|| || times |[c|]. The matrix ATA is . Find det ATA and det A. 


24 The box with edges į and j and w = 2i + 3j + 4k has height . What is the 
volume? What is the matrix with this determinant? What is i x j and what is its dot 
product with w? 


25 An n-dimensional cube has how many corners? How many edges? How many 
(n — 1)-dimensional faces? The cube in R” whose edges are the rows of 27 has 
volume . A hypercube computer has parallel processors at the corners with 
connections along the edges. 


26 = The triangle with corners (0, 0), (1, 0), (0, 1) has area b, The pyramid in R? with four 
corners (0, 0, 0), (1, 0,0), (0, 1, 0), (0,0, 1) has volume . What is the volume 
of a pyramid in R* with five corners at (0, 0,0, 0) and the rows of J? 


Problems 27-30 are about areas dA and volumes d V in calculus. 


27 Polar coordinates satisfy x = r cos@ and y = r sin 0. Polar area is J dr d0: 


J= dx/dr dx/00| _|cosð —rsinð 
— ldy/dr dy/d6| |sin@ rcosð|’ 
The two columns are orthogonal. Their lengths are . Thus J = ___ 


28 Spherical coordinates p,¢,@ satisfy x = psing@cos@ and y = psinġ sin and 
z = pcos®@. Find the 3 by 3 matrix of partial derivatives: 0x /dp, 0x/0¢, 0x /00@ in 
row 1. Simplify its determinant to J = p? sing. Then dV in spherical coordinates 
is o° sin ¢ dp déd8, the volume of an infinitesimal “coordinate box”. 


29 The matrix that connects r, 0 to x, y is in Problem 27. Invert that 2 by 2 matrix: 


ðr/ðx or/dy| _ 
06/dx db/day| 


cos@ ? 


-i _ =? 
JC = 9 F? 


It is surprising that dr/dx = d0x/dr (Calculus, Gilbert Strang, p. 501). Multiplying 
the matrices J and JT} gives the chain rule gx = 9X Ory ox x92 = l. 


Or Ox 
30 = The triangle with corners (0, 0), (6, 0), and (1, 4) has area . When you rotate 
it by 9 = 60° the area is . The determinant of the rotation matrix is 
cos@ —sin@ 1 9 
J sin  cos@ ? ? 
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Problems 31-38 are about the triple product (u x v) - w in three dimensions. 


31 A box has base area ||u x v||. Its perpendicular height is ||w|| cos 0. Base area times 
height = volume = ||ju x v|| ||w|| cos @ which is (u x v) - w. Compute base area, 
height, and volume for u = (2, 4,0), v = (—1, 3,0), w = (1,2, 2). 


32 The volume of the same box is given more directly by a 3 by 3 determinant. Evaluate 
that determinant. 


33 Expand the 3 by 3 determinant in equation (13) in cofactors of its row u1, U2, U3. 
This expansion is the dot product of u with the vector 


34 Which of the triple products (u x w). v and (w x u) v and (v x w)+ u are the same 
as (u x v) + w? Which orders of the rows u, v, w give the correct determinant? 


35 Let P = (1,0,—1) and Q = (1,1, 1) and R = (2,2, 1). Choose S so that PORS 
is a parallelogram and compute its area. Choose T, U, V so that OPORSTUYV isa 
tilted box and compute its volume. 


36 Suppose (x, y,z) and (1, 1,0) and (1,2, 1) lie on a plane through the origin. What 
determinant is zero? What equation does this give for the plane? 


37 Suppose (x, y, z) is a linear combination of (2, 3, 1) and (1, 2,3). What determinant 
is zero? What equation does this give for the plane of all combinations? 


38 (a) Explain from volumes why det2A = 2” det A for n by n matrices. 
(b) For what size matrix is the false statement det A + det A = det(A + A) true? 


Challenge Problems 


39 Ifyou know all 16 cofactors of a 4 by 4 invertible matrix A, how would you find A? 


40 Suppose A is a 5 by 5 matrix. Its entries in row 1 multiply determinants (cofactors) 
in rows 2-5 to give the determinant. Can you guess a “Jacobi formula” for det A 
using 2 by 2 determinants from rows 1-2 times 3 by 3 determinants from rows 3-5? 


Test your formula on the —1, 2, —1 tridiagonal matrix that has determinant = 6. 


41 The 2 by 2 matrix AB =(2 by 3)(3 by 2) has a “Cauchy-Binet formula” for det AB: 
det AB = sum of (2 by 2 determinants in A) (2 by 2 determinants in B) 


(a) Guess which 2 by 2 determinants to use from A and B. 
(b) Test your formula when the rows of A are 1,2,3 and 1,4,7 with B = A’. 


Chapter 6 


Eigenvalues and Eigenvectors 


6.1 Introduction to Eigenvalues 


Linear equations Ax = b come from steady state problems. Eigenvalues have their greatest 
importance in dynamic problems. The solution of du/dt = Au is changing with time— 
growing or decaying or oscillating. We can’t find it by elimination. This chapter enters a 
new part of linear algebra, based on Ax = Ax. All matrices in this chapter are square. 

A good model comes from the powers A, A*, A*,... of a matrix. Suppose you need the 
hundredth power A! . The starting matrix A becomes unrecognizable after a few steps, 
and A! is very close to [.6 .6; .4 .4]: 


8 3 70.45 650 525 
2.7 30 0.55 .350 .475 


A A? AS 


A!00 was found by using the eigenvalues of A, not by multiplying 100 matrices. Those 


eigenvalues (here they are 1 and 1/2) are a new way to see into the heart of a matrix. 

To explain eigenvalues, we first explain eigenvectors. Almost all vectors change di- 
rection, when they are multiplied by A. Certain exceptional vectors x are in the same 
direction as Ax. Those are the “eigenvectors”. Multiply an eigenvector by A, and the 
vector Ax is a number A times the original x. 


The basic equation is Ax = Ax. The number A is an eigenvalue of A. 


The eigenvalue A tells whether the special vector x is stretched or shrunk or reversed or left 
unchanged—when it is multiplied by A. We may find A = 2 or 4 or —1 or 1. The eigen- 
value A could be zero! Then Ax = Ox means that this eigenvector x is in the nullspace. 
If A is the identity matrix, every vector has Ax = x. All vectors are eigenvectors of J. 
All eigenvalues “lambda” are A = 1. This is unusual to say the least. Most 2 by 2 matrices 
have two eigenvector directions and two eigenvalues. We will show that det(A — AJ) = 0. 
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This section will explain how to compute the x’s and A’s. It can come early in the course 
because we only need the determinant of a 2 by 2 matrix. Let me use det(A — AJ) = 0 to 
find the eigenvalues for this first example, and then derive it properly in equation (3). 


Example 1 The matrix A has two eigenvalues A = 1 and A = 1/2. Look at det(A—AJ): 


_ | 8 3 8-A 3 42 3 1 l 
a=|5 3 ael 3 Z-a]=4 = 5245 =0-(2-5). 


I factored the quadratic into A — 1 times A — > to see the two eigenvalues A = 1 and 
A= L For those numbers, the matrix A — AJ becomes singular (zero determinant). The 
eigenvectors x, and x2 are in the nullspaces of A — J and A — $1 . 

(A — I)xi = Qis Ax; = x, and the first eigenvector is (.6, .4). 

(A— $1)x2 = Q is Áx = 5X2 and the second eigenvector is (1, —1): 


xi = [á] and Axı = É 3 [$] =x, (Ax = x means that A; = 1) 


1 8 3 l 5 i i 
X2 = i and Ax = [5 A il = 3] (this is 5 x2 so Az = 5). 
If xı is multiplied again by A, we still get xı. Every power of A will give A”x,; = x1. 
Multiplying x2 by A gave ix 2, and if we multiply again we get (3)? times x2. 


When A is squared, the eigenvectors stay the same. The eigenvalues are squared. 


This pattern keeps going, because the eigenvectors stay in their own directions (Figure 6.1) 
and never get mixed. The eigenvectors of A!°° are the same x; and x2. The eigenvalues 
of A) are 1100 = 1 and (4)!0° = very small number. 


Az = (1)? 
A=1 ax =21=|'$| Az =1 Xi (1)°x4 


A? = 25 
| A? x2 = (.5)*x2 = [3] 
A=.5 Axa = Aox2 = | | 
Ax = Àx 
x2 = i A*x = j*x 


Figure 6.1: The eigenvectors keep their directions. A? has eigenvalues 1? and (.5)?. 


Other vectors do change direction. But all other vectors are combinations of the two 
eigenvectors. The first column of A is the combination x; + (.2)x2: 


Separate into eigenvectors [5] = x1 + (.2)}x2 = | + 3 | . (1) 
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Multiplying by A gives (.7, .3), the first column of A”. Do it separately for xı and (.2)x2. 
Of course Ax; = xı. And A multiplies x2 by its eigenvalue + 


8) 7.7 , l _ | 6 1 
Multiply each x; byA; A 3 | = [3] is xı + 3 62)¥2 = H + [i] . 


Each eigenvector is multiplied by its eigenvalue, when we multiply by A. We didn’t need 
these eigenvectors to find A*. But it is the good way to do 99 multiplications. At every step 
x; is unchanged and x3 is multiplied by (4), so we have (4): 


8 1 6 very 
A”? [5] is really xı + (DG) x2 = [$] + | small 
' ` vector 


This is the first column of A100, The number we originally wrote as .6000 was not exact. 
We left out (.2)(3)?? which wouldn’t show up for 30 decimal places. 

The eigenvector x, is a “steady state” that doesn’t change (because A; = 1). The 
eigenvector x2 is a “decaying mode” that virtually disappears (because A2 = .5). The 
higher the power of A, the closer its columns approach the steady state. 

We mention that this particular A is a Markov matrix. Its entries are positive and 
every column adds to 1. Those facts guarantee that the largest eigenvalue is A = 1 (as we 
found). Its eigenvector x; = (.6,.4) is the steady state—which all columns of A* will 
approach. Section 8.3 shows how Markov matrices appear in applications like Google. 

For projections we can spot the steady state (A = 1) and the nullspace (A = 0). 


Example 2 


Its eigenvectors are x; = (1,1) and x2 = (1,-1). For those vectors, Px; = x, (steady 
state) and Px2 = 0 (nullspace). This example illustrates Markov matrices and singular 
matrices and (most important) symmetric matrices. All have special A’s and x’s: 


5.5 


1. Each column of P = | 5 5 


| adds to 1, so A = 1 is an eigenvalue. 


2. P is singular, so A = 0 is an eigenvalue. 
3. P is symmetric, so its eigenvectors (1, 1) and (1, —1) are perpendicular. 


The only eigenvalues of a projection matrix are 0 and 1. The eigenvectors for A = 0 
(which means Px = Ox) fill up the nullspace. The eigenvectors for A = 1 (which means 
Px = x) fill up the column space. The nullspace is projected to zero. The column space 
projects onto itself. The projection keeps the column space and destroys the nullspace: 


Project each part v = i | + B projects onto Pv = lo. + B ; 


Special properties of a matrix lead to special eigenvalues and eigenvectors. 
That is a major theme of this chapter (it is captured in a table at the very end). 
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Projections have A = 0 and 1. Permutations have all |A| = 1. The next matrix R (a 
reflection and at the same time a permutation) is also special. 


Example 3 The reflection matrix R = [9 } ] has eigenvalues 1 and —1, 


The eigenvector (1, 1) is unchanged by R. The second eigenvector is (1, —1)—its signs 
are reversed by R. A matrix with no negative entries can still have a negative eigenvalue! 
The eigenvectors for R are the same as for P , because reflection = 2(projection) — I: 


0 1 5S 5 1 0 
R=2P-I f o| =2(3 3l-[o | @) 
Here is the point. If Px = Ax then 2Px = 2Ax. The eigenvalues are doubled when 


the matrix is doubled. Now subtract Zx = x. The result is (2P —J)x = (2A — 1)x. 
When a matrix is shifted by I, each À is shifted by 1. No change in eigenvectors. 


X2 Pxi = x Rx, = xı 
Pxo = 0x2 
. us . , , `o —_— 
Projection onto blue line Reflection across line Rx2 = —x2 


Figure 6.2: Projections P have eigenvalues 1 and 0. Reflections R have À = 1 and —1. 
A typical x changes direction, but not the eigenvectors xı and x2. - 


Key idea: The eigenvalues of R and P are related exactly as the matrices are related: 
The eigenvalues of R = 2P — I are 2(1) —1 = 1 and 2(0) — 1 = —1. 


The eigenvalues of R? are 17. In this case R? = J. Check (1} = 1 and (—1)? = 1. 


The Equation for the Eigenvalues 


For projections and reflections we found A’s and x’s by geometry: Px = x, Px = 0, 
Rx = —x. Now we use determinants and linear algebra. This is the key calculation in 
the chapter—almost every application starts by solving Ax = Ax. 

First move Ax to the left side. Write the equation Ax = Ax as (A — ÀI )x = 0. The 
matrix A — AJ times the eigenvector x is the zero vector. The eigenvectors make up the 


nullspace of A — AI. When we know an eigenvalue A, we find an eigenvector by solving 
(A —AlI)x = 0. 


Eigenvalues first. If (A — AJ)x = 0 has a nonzero solution, A — AJ is not invertible. 
The determinant of A — AI must be zero. This is how to recognize an eigenvalue A: 
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Eigenvalues The number À is an eigenvalue of A if and only if A —AJ is singular: 


Equation for the eigenvalues = de(4-4)=0 O 


This “characteristic polynomial” det(A — AI) involves only A, not x. When A is n by n, 
equation (3) has degree n. Then A has n eigenvalues (repeats possible!) Each A leads to x: 


For each eigenvalue À solve (A—AZ)x = 0 or Ax = 1x to find an eigenvector x. 


Example 4 a=; 4 


1 2 | is already singular (zero determinant). Find its A’s and x’s. 

When A is singular, A = 0 is one of the eigenvalues. The equation Ax = Ox has 
solutions. They are the eigenvectors for A = 0. But det(A — AJ) = 0 is the way to find all 
A’s and x’s. Always subtract AJ from A: 


Subtract i from the diagonal to find A-—AI = i 2 A 4 2 | . (4) 
Take the determinant “ad — bc” of this 2 by 2 matrix. From 1 — À times 4 — A, 
the “ad” part is 4? — 54 + 4. The “bc” part, not containing A, is 2 times 2. 
1—A 2 2 
det > 4-1 |= (1 —A)(4—A) — (2)(2) =47-— 54. (5) 


Set this determinant à? — 5 to zero. One solution is A = 0 (as expected, since A is 
singular). Factoring into A times A — 5, the other root is A = 5: 


det(A —AI) =i? - 5A =0 yields the eigenvalues ‘Yy=0- and AQ= 5 7 


Now find the eigenvectors. Solve (A — AJ)x = 0 separately for A; = 0 and Az = 5: 
= 71 27fy]_ fol .. TyT par B 
(A —O/)x = E A [>] = | J yields an eigenvector [2|- E | for A; = 0 
| : for A» = 5. 


(A—5I)x = E Hm | = a yields an eigenvector LJ-b 
The matrices A — OJ and A — 5/ are singular (because O and 5 are eigenvalues). The 
eigenvectors (2, —1) and (1, 2) are in the nullspaces: (A — AJ)x = 0 is Ax = Ax. 

We need to emphasize: There is nothing exceptional about 1 = 0. Like every other 
number, zero might be an eigenvalue and it might not. If A is singular, it is. The eigenvec- 
tors fill the nullspace: Ax = Ox = 0. If A is invertible, zero is not an eigenvalue. We shift 
A by a multiple of 7 to make it singular. 

In the example, the shifted matrix A — 5/ is singular and 5 is the other eigenvalue. 
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Summary To solve the eigenvalue problem for an n by n matrix, follow these steps: 


Compute the determinant of A — AI. 


Find the roots of this polynomial, 


solve (A — AI)x = 0 to find an eigenvector x. 


A note on the eigenvectors of 2 by 2 matrices. When A — AJ is singular, both rows are 
multiples of a vector (a, b). The eigenvector is any multiple of (b, —a). The example had 
A = OanddA = 5: 


à = 0: rows of A — OJ in the direction (1, 2); eigenvector in the direction (2, —1) 
A = 5: rows of A — 57 in the direction (—4, 2); eigenvector in the direction (2, 4). 


Previously we wrote that last eigenvector as (1,2). Both (1,2) and (2,4) are correct. 
There is a whole line of eigenvectors—any nonzero multiple of x is as good as x. 
MATLAB’s eig(A) divides by the length, to make the eigenvector into a unit vector. 

We end with a warning. Some 2 by 2 matrices have only one line of eigenvectors. 
This can only happen when two eigenvalues are equal. (On the other hand A = Z has 
equal eigenvalues and plenty of eigenvectors.) Similarly some n by n matrices don’t have 
n independent eigenvectors. Without n eigenvectors, we don’t have a basis. We can’t write 
every v as a combination of eigenvectors. In the language of the next section, we can’t 
diagonalize a matrix without n independent eigenvectors. 


Good News, Bad News 


Bad news first: If you add a row of A to another row, or exchange rows, the eigenvalues 
usually change. Elimination does not preserve the à's. The triangular U has its eigenvalues 
sitting along the diagonal—they are the pivots. But they are not the eigenvalues of A! 
Eigenvalues are changed when row 1 is added to row 2: 


1 3 


2 6 


1 3 
v=; >| has À = 0 and À = 1; a=] 


| has A = 0 and À = 7. 
Good news second: The product A; times A2 and the sum A, + Az can be found quickly 
from the matrix. For this A, the product is 0 times 7. That agrees with the determinant 
(which is 0). The sum of eigenvalues is 0 + 7. That agrees with the sum down the main 
diagonal (the trace is 1 + 6). These quick checks always work: 


The product of the n eigenvalues equals the determinant. 
The sum of the n eigenvalues equals the sum of the n diagonal entries. 
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The sum of the entries on the main diagonal is called the trace of A: 


e + ann. 


+ 


Those checks are very useful. They are proved in Problems 16-17 and again in the next 
section. They don’t remove the pain of computing À’s. But when the computation is wrong, 
they generally tell us so. To compute the correct A’s, go back to det(A — AI) = 0. 

The determinant test makes the product of the A’s equal to the product of the pivots 
(assuming no row exchanges). But the sum of the A’s is not the sum of the pivots—as the 


example showed. The individual A’s have almost nothing to do with the pivots. In this new 
part of linear algebra, the key equation is really nonlinear: A multiplies x. 


Why do the eigenvalues of a triangular matrix lie on its diagonal? 


Imaginary Eigenvalues 


One more bit of news (not too terrible). The eigenvalues might not be real numbers. 


After a rotation, no vector Qx stays in the same direction as x (except x = 0 which is 
useless). There cannot be an eigenvector, unless we go to imaginary numbers. Which we 
do. 


To see how ¿į can help, look at Q? which is —J. If Q is rotation through 90°, then 


Q? is rotation through 180°. Its eigenvalues are —1 and —1. (Certainly -—Ix = —Ix.) 
Squaring O will square each A, so we must have A? = —1. The eigenvalues of the 90° 
rotation matrix Q are +i and —i, because i? = —1. 


Those A’s come as usual from det(Q — AJ) = 0. This equation gives A? + 1 = 0. 
Its roots are i and —i. We meet the imaginary number i also in the eigenvectors: 


Complex o -l]{/1]/_ .J1 and 0 -l]}iy_.fi 

eigenvectors 1 oll: 1 ojpi fay’ 
Somehow these complex vectors x; = (1,i) and x2 = (i, 1) Keep their direction as 
they are rotated. Don’t ask me how. This example makes the all-important point that real 


matrices can easily have complex eigenvalues and eigenvectors. The particular eigenvalues 
i and —i also illustrate two special properties of Q: 


1. Q is an orthogonal matrix so the absolute value of each A is |A| = 1. 


2. Q is a skew-symmetric matrix so each À is pure imaginary. 
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A symmetric matrix (AT = A) can be compared to a real number. A skew-symmetric 
matrix (AT = —A) can be compared to an imaginary number. An orthogonal matrix 
(ATA = J) can be compared to a complex number with |A| = 1. For the eigenvalues those 
are more than analogies—they are theorems to be proved in Section 6.4. 

The eigenvectors for all these special matrices are perpendicular. Somehow (i, 1) and 
(1,2) are perpendicular (Chapter 10 explains the dot product of complex vectors). 


Eigshow in MATLAB 


There is a MATLAB demo (just type eigshow), displaying the eigenvalue problem for a 2 
by 2 matrix. It starts with the unit vector x = (1,0). The mouse makes this vector move 
around the unit circle. At the same time the screen shows Ax, in color and also moving. 
Possibly Ax is ahead of x. Possibly Ax is behind x. Sometimes Ax is parallel to x. At 
that parallel moment, Ax = Ax (at x; and x2 in the second figure). 


y = (0,1) 4a [08 0.3 x2, 
0.2 0.7 , Ax, =X 1 
— \ 
Q.3, 0.7) Ax2 = 0x2 
\ l 
\ \ / 
Ax = (0.8, 0.2) / 
L 
N ra 
x = (1,0) ~~ — —~ circle of x’s 
These are not eigenvectors Ax lines up with x at eigenvectors 


The eigenvalue A is the length of Ax, when the unit eigenvector x lines up. The built-in 
choices for A illustrate three possibilities: 0, 1, or 2 directions where Ax crosses x. 


0. There are no real eigenvectors. Ax stays behind or ahead of x. This means the 
eigenvalues and eigenvectors are complex, as they are for the rotation Q. 


1. There is only one line of eigenvectors (unusual). The moving directions Ax and x 
touch but don’t cross over. This happens for the last 2 by 2 matrix below. 


2. There are eigenvectors in two independent directions. This is typical! Ax crosses x 
at the first eigenvector x1, and it crosses back at the second eigenvector x2. Then 
Ax and x cross again at —x, and —x2. 


You can mentally follow x and Ax for these five matrices. Under the matrices I will 
count their real eigenvectors. Can you see where Ax lines up with x? 


El fal Goa) Ea Go 
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When A is singular (rank one), its column space is a line. The vector Ax goes up 
and down that line while x circles around. One eigenvector x is along the line. Another 
eigenvector appears when Ax2 = 0. Zero is an eigenvalue of a singular matrix. 


m REVIEW OF THE KEY IDEAS ~" 


. Ax = Ax says that eigenvectors x keep the same direction when multiplied by A. 
. Ax = Ax also says that det(A — AJ) = 0. This determines n eigenvalues. 


. The eigenvalues of A? and A7! are A? and A7!, with the same eigenvectors. 


Aa U N m 


. The sum of the A’s equals the sum down the main diagonal of A (the trace). 
The product of the 4’s equals the determinant. 


5. Projections P , reflections R, 90° rotations Q have special eigenvalues 1,0, —1,i, —i. 
Singular matrices have A = 0. Triangular matrices have A’s on their diagonal. 


= WORKED EXAMPLES = 


6.1A Find the eigenvalues and eigenvectors of A and A? and A7! and A+4/: 
f2 a [5 4 
a=| J and A -| s|: 
Check the trace A, + Àz and the determinant À; À2 for A and also A”. 


Solution The eigenvalues of A come from det(A — AJ) = 0: 


2-A -l 


det(A — AZ) -| are 


J=27- 44 3=0 


This factors into (A —1)(A—3) = 0 so the eigenvalues of A are A; = 1 and À2 = 3. For the 
trace, the sum 2+2 agrees with 1+3. The determinant 3 agrees with the product A;A2 = 3. 
The eigenvectors come separately by solving (A — AJ)x = 0 whichis Ax = Ax: 


1 -l||x ojl. . _ {1 
à=}: (A-J)x =| l= lol gives the eigenvector x, =|] 


—1 -l|)x oj . . 1 
A=3: (A-3/)x = E z H = fo] gives the eigenvector x2 = i | 
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A? and AT! and A + 4I keep the same eigenvectors as A. Their eigenvalues are À? and 
A! and A + 4: 


2 . 2_ 2 —1 l l l+4=5 
A” has eigenvalues 1“ = I and3% = 9 A“ has 113 A + 4I has 344=7 
The trace of A? is 5 + 5 which agrees with 1 + 9. The determinant is 25 — 16 = 9. 

Notes for later sections: A has orthogonal eigenvectors (Section 6.4 on symmetric 
matrices). A can be diagonalized since 4; # Az (Section 6.2). A is similar to any 2 by 2 
matrix with eigenvalues 1 and 3 (Section 6.6). A is a positive definite matrix (Section 6.5) 
since A = A7 and the A’s are positive. 


6.1B Find the eigenvalues and eigenvectors of this 3 by 3 matrix A: 


Symmetric matrix 1 -l 0 
Singular matrix A={-l 2 -I 
Tracel+2+1=4 0 -l 1 


Solution Since all rows of A add to zero, the vector x = (1,1, 1) gives Ax = 0. This 
is an eigenvector for the eigenvalue à = 0. To find Az and A3 I will compute the 3 by 3 
determinant: 


I-A -1 0 | =Q-AQ—AA—A —-201 -A) 
det(A-AN=| -1 2-A -1| =0-ŅVR-XA0-4)-2 
0 -1 1-Al =0-A(-XŅ8-1). 


That factor —A confirms that A = 0 is a root, and an eigenvalue of A. The other factors 
(1 — A) and (3 — A) give the other eigenvalues 1 and 3, adding to 4 (the trace). Each 
eigenvalue 0, 1, 3 corresponds to an eigenvector : 


1 1 1 
x, =]1 Axı = 0X; X2 = 0 Ax: = 1x2 x3 = | —2 Áx3 = 3x3. 
1 —1 1 


I notice again that eigenvectors are perpendicular when A is symmetric. 

The 3 by 3 matrix produced a third-degree (cubic) polynomial for det(A — AJ) = 
—A3 + 412 — 3A. We were lucky to find simple roots A = 0, 1,3. Normally we would use 
a command like eig(A), and the computation will never even use determinants (Section 9.3 
shows a better way for large matrices). 

The full command [S, D] = eig(A) will produce unit eigenvectors in the columns of 
the eigenvector matrix S. The first one happens to have three minus signs, reversed from 
(1, 1, 1) and divided by /3. The eigenvalues of A will be on the diagonal of the eigenvalue 
matrix (typed as D but soon called A). 
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Problem Set 6.1 


1 The example at the start of the chapter has powers of this matrix A: 


_[8 3 2_ [70 .45 »_ [6 6 
a=|% 5 and A =(% S| and A -|5 Ai 


Find the eigenvalues of these matrices. All powers have the same eigenvectors. 


(a) Show from A how a row exchange can produce different eigenvalues. 


(b) Why is a zero eigenvalue not changed by the steps of elimination? 


2 Find the eigenvalues and the eigenvectors of these two matrices: 


1 4 2 4 
4=|) | and a+t=|3 ‘|: 


A+ 1 has the eigenvectors as A. Its eigenvalues are by 1. 


3 Compute the eigenvalues and eigenvectors of A and AT}. Check the trace ! 
_|0 2 -ı _|—1/2 1 
a=) 1 and A -| 1/2 ol: 


AT! has the eigenvectors as A. When A has eigenvalues A, and Az, its inverse 
has eigenvalues 


4 Compute the eigenvalues and eigenvectors of A and A?: 


_{-l 3 2_| 7-3 
a=; l and A =| A 


A? has the same as A. When A has eigenvalues À; and A2, A” has eigenvalues 
. In this example, why is 4f + A3 = 13? 


5 Find the eigenvalues of A and B (easy for triangular matrices) and A + B: 


3 0] 1 1 4 1 
a=} i and =l 3] and a+8=|i i]: 
Eigenvalues of A + B (are equal to)(are not equal to) eigenvalues of A plus eigen- 


values of B. 


6 Find the eigenvalues of A and B and AB and BA: 


1 0 1 2 1 2 3 2 
‘=l; | and B=, | and 4B =|} | and pa=|i ‘| 


(a) Are the eigenvalues of AB equal to eigenvalues of A times eigenvalues of B? 
(b) Are the eigenvalues of AB equal to the eigenvalues of BA? 
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11 


12 


13 


14 
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Elimination produces A = LU. The eigenvalues of U are on its diagonal; they 
are the . The eigenvalues of L are on its diagonal; they are all . The 
eigenvalues of A are not the same as 


(a) If you know that x is an eigenvector, the way to find A is to 


(b) If you know that A is an eigenvalue, the way to find x is to . 
What do you do to the equation Ax = Ax, in order to prove (a), (b), and (c)? 


(a) A? is an eigenvalue of A*, as in Problem 4. 
(b) àT! is an eigenvalue of A7!, as in Problem 3. 


(c) A+ 1 is an eigenvalue of A + J, as in Problem 2. 


Find the eigenvalues and eigenvectors for both of these Markov matrices A and A”. 
Explain from those answers why A!°° is close to A™: 


a= [$ 3] ma ae 1] 


Here is a strange fact about 2 by 2 matrices with eigenvalues A; Æ A2: The columns 
of A — Àı I are multiples of the eigenvector x2. Any idea why this should be? 


Find three eigenvectors for this matrix P (projection matrices have A = 1 and 0): 


2 4 0 
Projection matrix P=|.4 8 0 
0 0 | 


If two eigenvectors share the same A, so do all their linear combinations. Find an 
eigenvector of P with no zero components. 


From the unit vector u = (2,2,2,2) construct the rank one projection matrix 
P = uu”. This matrix has P? = P because uu = 1. 
(a) Pu=u comes from (uu')u=u( ). Then u is an eigenvector with A = 1. 


(b) If v is perpendicular to u show that Pv = 0. Then A = 0. 


(c) Find three independent eigenvectors of P all with eigenvalue A = 0. 
Solve det(Q — AJ) = 0 by the quadratic formula to reach A = cos @ +i sin 8: 


0 = ee — sin 


sin@ cos8 | rotates the xy plane by the angle @. No real A’s. 


Find the eigenvectors of Q by solving (Q —AI)x = 0. Use i? = —1. 
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15 


16 


17 


18 


19 


20 


21 


22 


Every permutation matrix leaves x = (1,1,...,1) unchanged. Then A = 1. Find 
two more A’s (possibly complex) for these permutations, from det(P — AJ) = 0: 


0 1 0 0 0 1 
P=|0 0 1 and P=1|{]0 1 0 
1 0 0 1 0 0 


The determinant of A equals the product 1,A2-+-A,. Start with the polynomial 
det(A — AJ) separated into its n factors (always possible). Then set A = 0: 


det(A — AI) = (Ay —A)(A2 — À) (Ay, —A) so detA= 
Check this rule in Example 1 where the Markov matrix has À = 1 and 3. 


The sum of the diagonal entries (the trace) equals the sum of the eigenvalues: 


a= A has det(A — AI) = 1° — (a + d)à +ad — bc = 0. 


The quadratic formula gives the eigenvalues À = (a+d +4  )/2 and À = 
Their sum is . If A has A; = 3 and Az = 4 then det(A — AJ) = 


If A has A; = 4and Az = 5 then det(A — AZ) = (A — 4) (À — 5) = å? — 914 + 20. 
Find three matrices that have trace a + d = 9 and determinant 20 and À = 4,5. 


A 3 by 3 matrix B is known to have eigenvalues 0, 1,2. This information is enough 
to find three of these (give the answers where possible) : 

(a) the rank of B 

(b) the determinant of BTB 

(c) the eigenvalues of BTB 

(d) the eigenvalues of (B? + J)7!. 


Choose the last rows of A and C to give eigenvalues 4, 7 and 1, 2, 3: 


0 1 0 1 0 
Companion matrices A= | | c=|]0 0 1 
* k Ox 


The eigenvalues of A equal the eigenvalues of AT. This is because det(A — AJ) 
equals det(A? — AJ). That is true because . Show by an example that the 
eigenvectors of A and A? are not the same. 


Construct any 3 by 3 Markov matrix M: positive entries down each column add to 1. 
Show that MT(1,1,1) = (1,1,1). By Problem 21, A = 1 is also an eigenvalue 
of M. Challenge: A 3 by 3 singular Markov matrix with trace 4 has what A’s? 
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28 
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Find three 2 by 2 matrices that have 1; = Az = 0. The trace is zero and the 
determinant is zero. A might not be the zero matrix but check that A? = 0. 


This matrix is singular with rank one. Find three A’s and three eigenvectors: 


1 2 1 2 
A=|2|[212]=|42 4 
1 2 1 2 
Suppose A and B have the same eigenvalues A;,...,A, with the same independent 
eigenvectors X1,...,X,. Then A = B. Reason: Any vector x is a combination 


Cixi +- + CnXn. What is Ax? What is Bx? 


The block B has eigenvalues 1,2 and C has eigenvalues 3, 4 and D has eigenval- 
ues 5,7. Find the eigenvalues of the 4 by 4 matrix A: 


0 1 3 0 

A= B C]_|-2 3 0 4 

io D| | 00 6 1 

0 0 1 6 

Find the rank and the four eigenvalues of A and C 

1 1 1 1 101 0 
|1 1 1 1 _{0 10 1 
A=li 11.1] ™ CS| 0 1 0 
i 1 1 1 01 0 1 


O 1 1 1 0 —1 -li -l 
1 O 1 1 —|] 0 -l —i 
B=A-J= 1101 and C=]-A= i 0 -1 
1 1 10 —] —1 -l 0 
(Review) Find the eigenvalues of A, B, and C: 
1 2 3 00 1 2 2 2 
A=|0 4 5 and B=!0 2 0 and C=/2 2 2 
0 0 6 3 0 0 2 2 2 


When a + b =c + d show that (1, 1) is an eigenvector and find both eigenvalues : 


a=|' A 
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31 If we exchange rows 1 and 2 and columns 1 and 2, the eigenvalues don’t change. 
Find eigenvectors of A and B for A = 11. Rank one gives Az = A3 = 0. 


12 1 6 3 3 
A=|3 6 3 and B= PAP'=/2 1 1 
4 8 4 8 4 4 


32 Suppose A has eigenvalues 0, 3,5 with independent eigenvectors U, V, Ww. 


(a) Give a basis for the nullspace and a basis for the column space. 

(b) Find a particular solution to Ax = v + w. Find all solutions. 

(c) Ax =u has no solution. If it did then would be in the column space. 
33 Suppose u, v are orthonormal vectors in R?, and A = uv". Compute A? = uv" uv! 
to discover the eigenvalues of A. Check that the trace of A agrees with A, + Az. 


34 Find the eigenvalues of this permutation matrix P from det (P — AI) = 0. Which 
vectors are not changed by the permutation? They are eigenvectors for A = 1. Can 
you find three more eigenvectors? 


00 0 1 

100 0 
P=1010 0 

00 1 0 
Challenge Problems 


35 There are six 3 by 3 permutation matrices P. What numbers can be the determinants 
of P? What numbers can be pivots? What numbers can be the trace of P? What 
four numbers can be eigenvalues of P , as in Problem 15? 


36 = Is there areal2 by2 matrix (other than I ) with A? = I? Its eigenvalues must satisfy 
13 = 1. They can be e?7'/3 and e~27'/3_ What trace and determinant would this 
give? Construct a rotation matrix as A (which angle of rotation?). 


37 (a) Find the eigenvalues and eigenvectors of A. They depend on c: 


4 l-e 
a=|‘ c |. 


(b) Show that A has just one line of eigenvectors when c = 1.6. 


(c) This is a Markov matrix when c =.8. Then A” will approach what matrix A”? 
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6.2 Diagonalizing a Matrix 


When x is an eigenvector, multiplication by A is just multiplication by a number A: 
Ax = ix. All the difficulties of matrices are swept away. Instead of an interconnected 
system, we can follow the eigenvectors separately. It is like having a diagonal matrix, with 
no off-diagonal interconnections. The 100th power of a diagonal matrix is easy. 

The point of this section is very direct. The matrix A turns into a diagonal matrix A 
when we use the eigenvectors properly. This is the matrix form of our key idea. We start 
right off with that one essential computation. 


The matrix A is “diagonalized.’ We use capital lambda for the eigenvalue matrix, 
because of the small A’s (the eigenvalues) on its diagonal. 


Proof Multiply A times its eigenvectors, which are the columns of S. The first column of 
AS is Axı. That is Ayx,. Each column of S is multiplied by its eigenvalue À; : 


A times S AS =A |X, e Xn | =| Aixi t AnxXn 


The trick is to split this matrix AS into S times A: 
Ay 
S times A Aixi +t) AnpX¥n |=) xi -> Xn e = ŞA. 
A Àn 
Keep those matrices in the right order! Then A; multiplies the first column x1, as shown. 
The diagonalization i is complete, and we can write AS = SA in twọ good ways: 


A= = SAS. z (2) 


l is . S -14S = A “of j : 


The matrix S has 3 an inverse, because its columns (ihe eigenvectors of A) were assumed to 
be linearly independent. Without n independent eigenvectors, we can’t diagonalize. 


A and A have the same eigenvalues A;,...,A,. The eigenvectors are different. The 
job of the original eigenvectors x1,...,Xn was to diagonalize A. Those eigenvectors in S 
produce A = SAS7~!, You will soon see the simplicity and importance and meaning of 
the nth power A” = SA” S7}, 
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Example 1 This A is triangular so the A’s are on the diagonal: A = 1 and À = 6. 


Kicenvect 1] f1 1 -1] fi 5] f1 1]_ fa 0 
genvectors | oj |1 o ıl lo 6| lo ı|7 lo 6 
s-! A S A 


In other words A = SAS~!. Then watch A? = SAS7!SAS~!. When you remove 
STIS = I, this becomes SA? ST. Same eigenvectors in S and squared eigenvalues 
in A?. 

The kth power will be Ak = SA* ST} which is easy to compute: 


k 
1 5 11171 1 -l 1 6-1 


With k = 1 we get A. With k = Owe get A? = I (and à? = 1). With k = —1 we get A7}. 
You can see how A? = [1 35; 0 36] fits that formula when k = 2. 
Here are four small remarks before we use A again. 


Remark 1 Suppose the eigenvalues 1,,...,A, are all different. Then it is automatic that 
the eigenvectors ¥1,...,X,, are independent. Any matrix that has no repeated eigenvalues 
can be diagonalized. 


Remark 2 We can multiply eigenvectors by any nonzero constants. Ax = Ax willremain 
true. In Example 1, we can divide the eigenvector (1, 1) by ./2 to produce a unit vector. 


Remark 3 The eigenvectors in S come in the same order as the eigenvalues in A. To 
reverse the order in A, put (1, 1) before (1,0) in S: 


O 1Ij|1 Syyl 1 6 0 
New order 6, 1 f allo AlE ol= lo 1 | = Anew 


To diagonalize A we must use an eigenvector matrix. From STAS = A we know that 
AS = SA. Suppose the first column of S is x. Then the first columns of AS and SA are 
Ax and A;x. For those to be equal, x must be an eigenvector. 


Remark 4 (repeated warning for repeated eigenvalues) Some matrices have too few 
eigenvectors. Those matrices cannot be diagonalized. Here are two examples: 


i -l 0 0 


Their eigenvalues happen to be 0 and 0. Nothing is special about A = 0, it is the repetition 
of A that counts. All eigenvectors of the first matrix are multiples of (1, 1): 


Not diagonalizable A= I mH and B= È Ji 


Only one line _ 1 -l _ {0 _ fl 
of eigenvectors Ax = 0x means hi HM = | B fo] and x=c H 


There is no second eigenvector, so the unusual matrix A cannot be diagonalized. 

Those matrices are the best examples to test any statement about eigenvectors. In many 
true-false questions, non-diagonalizable matrices lead to false. 

Remember that there is no connection between invertibility and diagonalizability: 
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- Invertibility is concerned with the eigenvalues (A = 0 or A Æ 0). 


- Diagonalizability is concerned with the eigenvectors (too few or enough for S). 


Each eigenvalue has at least one eigenvector! A — AJ is singular. If (A —AJ)x = 0 leads 
you tox = 0, A is not an eigenvalue. Look for a mistake in solving det(A — AJ) = 0. 
Eigenvectors for n different \’s are independent. Then we can diagonalize A. 


Proof Suppose cix +c2x2 = 0. Multiply by A to find cy, x1 +c2A2x2 = 0. Multiply 
by Az to find cyA2x1 + C2A2x2 = 0. Now subtract one from the other: 


Subtraction leaves (A; — A2)cıxı =0. Therefore cı = 0. 


Since the A’s are different and x, #4 0, we are forced to this conclusion that c} = 0. 
Similarly c2 = 0. No other combination gives cjx + c2x2 = 0, so the eigenvectors x 
and x2 must be independent. 

This proof extends directly to j eigenvectors. Suppose ¢yx1+---+c;x ; = 0. Multiply 
by A, multiply by À ;, and subtract. This removes x ;. Now multiply by A and by À ;_; and 
subtract. This removes x ;_;. Eventually only x; is left: 


(Ay — 2) (Ay —A;)eyx; =0 whichforces cı = 0. (3) 


Similarly every c; = 0. When the A’s are all different, the eigenvectors are independent. 
A full set of eigenvectors can go into the columns of the eigenvector matrix S. 


Example 2 Powers of A The Markov matrix A = [-8-3] in the last section had 
A, = land å; = .5. Here is A = SAS! with those eigenvalues in the diagonal A: 


[2 a}=[4 allo s][4 ~s] = sas" 


The eigenvectors (.6, .4) and (1, —1) are in the columns of S. They are also the eigenvectors 
of A?. Watch how A? has the same S, and the eigenvalue matrix of A? is A’: 


` 


Same S for A? ar SASH) (4) 


Just keep going, and you see why the high powers A* approach a “steady state”: 


k_ orkoi [6 1)f1F 0 1 1 
Powers of A A" = SA*S -|5 alls lla mt 


As k gets larger, (.5)* gets smaller. In the limit it disappears completely. That limit is A”: 
6 iil Off 1 | 6 6 
oe CO __ — 
Limitk >o A” = É 4 E 4 È -6l = É Hi 
The limit has the eigenvector x; in both columns. We saw this A% on the very first page 
of the chapter. Now we see it coming, from powers like A!°° = $A 1005-1, 
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Y All [Al <1. 


on : When does A* — zero matrix? 


Fibonacci Numbers 


We present a famous example, where eigenvalues tell how fast the Fibonacci numbers grow. 
Every new Fibonacci number is the sum of the two previous F’s: 


EEEE 


“Fes = Bent Fe 


These numbers turn up in a fantastic variety of applications. Plants and trees grow in a 
spiral pattern, and a pear tree has 8 growths for every 3 turns. For a willow those numbers 
can be 13 and 5. The champion is a sunflower of Daniel O’Connell, which had 233 seeds 
in 144 loops. Those are the Fibonacci numbers F13 and F2. Our problem is more basic. 


Problem: Find the Fibonacci number Fıọọ The slow way is to apply the rule 
Fgk+2 = Fk41 + Fy one step at a time. By adding Fe = 8 to F7 = 13 we reach Fg = 21. 
Eventually we come to Foo. Linear algebra gives a better way. 

The key is to begin with a matrix equation uz+ı = Aug. That is a one-step rule for 
vectors, while Fibonacci gave a two-step rule for scalars. We match those rules by putting 
two Fibonacci numbers into a vector. Then you will see the matrix A. 


Every step multiplies by A = [12]. After 100 steps we reach t100 = A’ ug: 


_fi ofi _ {2 _ {3 _ | Fior 
uo — 0 , ui — 1 ’ H2 — 1 , u3 — 2 ’ ... H100 = Fioo . 


This problem is just right for eigenvalues. Subtract À from the diagonal of A: 


1AA 1 


a-a =| ty 


l leads to det(A — AI) =A*-A-1. 


The equation 4? — A — 1 = 0 is solved by the quadratic formula (—b + Vb? — 4ac ) /2a: 


1+5 
2 


Eigenvalues : Ai = zx 1.618 : 


These eigenvalues lead to eigenvectors x; = (Aj, 1) and x2 = (Az, 1). Step 2 finds the 
combination of those eigenvectors that gives uo = (1,0): 


1 _ 1 My _ A2 _ *17%2 
ol-ace(T|-[7]) e =F ©) 
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Step 3 multiplies wg by A! to find uj499. The eigenvectors xı and x stay separate! 
They are multiplied by (A,)!0° and (A2)1: 


100 steps from uo 


(7) 


We want Fioọ = second component of t100. The second components of x; and x2 are 1. 
The difference between (1 + /5)/2 and (1 — V5) /2 isA, — Àà2 = V5. We have Fioo: 


1 1475 100 1—5 100 20 


Is this a whole number? Yes. The fractions and square roots must disappear, because 
Fibonacci’s rule Fk+2 = Fk41 + Fk stays with integers. The second term in (8) is less 
than L, so it must move the first term to the nearest whole number: 


ak — ak 1 (1+ V¥5\* 
kth Fibonacci number = ~ 2 = nearest integer to ( + £) . (9) 
Ay —A2 V5 2 
The ratio of Fg to Fs is 8/5 = 1.6. The ratio F19,/F 199 must be very close to the 
limiting ratio (1 + v5 ) / 2. The Greeks called this number the “golden mean”. 
For some reason a rectangle with sides 1.618 and 1 looks especially graceful. 


Matrix Powers A* 


Fibonacci’s example is a typical difference equation ug} = Aug. Each step multiplies 
by A. The solution is up = A*ug. We want to make clear how diagonalizing the matrix 
gives a quick way to compute AF and find ux in three steps. 

The eigenvector matrix S produces A = SAS7!. This is a factorization of the matrix, 
like A = LU or A = QR. The new factorization is perfectly suited to computing powers, 
because every time ST! multiplies S we get I: 


Powers of A AF ug = (SAST!) ++» (SAS7!)uo = SAF S7 ug 
I will split SA‘ S—!up into three steps that show how eigenvalues work: 


he eigenvectors. Then e = S~!xp. 


2. M y ra : Now we have AF S—!up, 
- Akug. This is SAFS~1uo. 
Aug Up = Auo = c1 (Ai)"x1 +e + enn)" xn. (10) 


In matrix language A* equals (SAS~!)* which is S times A* times S~!. In Step 1, 


6.2. Diagonalizing a Matrix 303 


the eigenvectors in S lead to the c’s in the combination wg = Cy X1 +°- + Cy Xp! 


C1 
Step 1 Up = |X. ° Xp : |. This says that wo = Se. (11) 


Cn 


The coefficients in Step 1 are c = S~1!uo. Then Step 2 multiplies by A*. The final result 
uy = > ¢;(A;)* x; in Step 3 is the product of S and A* and S~!uo: 


(A1)* Cy 
A¥ug = SAF Sug = SAFe = | xi ... Xn E : |- (12) 
(An)* Cn 
This result is exactly ug = c1(Ay)*xy feeet Cn(An)* xn. It solves upg = Aux. 


Example 3 Start from uo = (1,0). Compute Aug when S and A contain these eigen- 
vectors and eigenvalues: 


1 2 2 1 
a=); | has A; =2 and “i =|7]. Az =-—1 and =| 


This matrix is like Fibonacci except the rule is changed to Fk+2 = Fk4i + 2Fy. 
The new numbers start 0, 1, 1,3. They grow faster from A = 2. 


Solution in three steps Find to = cjx1 + c2Xx2 and then ug = ci (à1)f x; + 2(A2)* x2 


i 112 If 1 l 
Step 1 w= |o]=3[7|+3 a so asns 7 
Step 2 Multiply the two parts by (A,)* = 2* and (A2)* = (-1)* 
Step 3 Combine eigenvectors c4 (A,)¥x 1 and ¢2(A2)* x2 into uz: 
1 2 i 1 
— 4k — ak —(—1)* 
up = AX up uk = 32 Hua 1) Hi (13) 


The new number is Fy = (2% — (—1)*)/3. After 0, 1,1,3 comes Fy = 15/3 = 5. 


Behind these numerical examples lies a fundamental idea: Follow the eigenvectors. In 
Section 6.3 this is the crucial link from linear algebra to differential equations (powers A* 
will become e?*). Chapter 7 sees the same idea as “transforming to an eigenvector basis.” 
The best example of all is a Fourier series, built from the eigenvectors of d/dx. 
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Nondiagonalizable Matrices (Optional) 
Suppose A is an eigenvalue of A. We discover that fact in two ways: 
1. Eigenvectors (geometric) There are nonzero solutions to Ax = Ax. 
2. Eigenvalues (algebraic) The determinant of A — AJ is zero. 


The number A may be a simple eigenvalue or a multiple eigenvalue, and we want to know 
its multiplicity. Most eigenvalues have multiplicity M = 1 (simple eigenvalues). Then 
there is a single line of eigenvectors, and det(A — AJ) does not have a double factor. 

For exceptional matrices, an eigenvalue can be repeated. Then there are two different 
ways to count its multiplicity. Always GM < AM for each A: 


1. 


| _ Count the independent eigenvectors for À. This 
is the dimension of the nullspace of A — AJ. 


Count the repetitions of A among the eigenval- 
ues. Look at the n roots of det(A —AJ) = 0. 


If A has A = 4, 4, 4, that eigenvalue has AM = 3 and GM = 1, 2, or 3. 
The following matrix A is the standard example of trouble. Its eigenvalue A = 0 is 
repeated. It is a double eigenvalue (AM = 2) with only one eigenvector (GM = 1). 


AM = 2 a= 


1 
GM=1 | has d4 ~ 41) =| 


-À l — )2 A = 0,0 but 
0 0 


0 -A| 1 eigenvector 


There “should” be two eigenvectors, because A* = 0 has a double root. The double factor 
A? makes AM = 2. But there is only one eigenvector x = (1,0). This shortage of 
eigenvectors when GM is below AM means that A is not diagonalizable. 


The vector called “repeats” in the Teaching Code eigval gives the algebraic multiplicity 
AM for each eigenvalue. When repeats = [1 1... 1] we know that the n eigenvalues are 
all different and A is diagonalizable. The sum of all components in “repeats” is always n, 
because every nth degree equation det(A — AJ) = 0 has 7 roots (counting repetitions). 

The diagonal matrix D in the Teaching Code eigvec gives the geometric multiplicity 
GM for each eigenvalue. This counts the independent eigenvectors. The total number of 
independent eigenvectors might be less than n. Then A is not diagonalizable. 


We emphasize again: A = 0 makes for easy computations, but these three matrices also 
have the same shortage of eigenvectors. Their repeated eigenvalue is A = 5. Traces are 10, 
determinants are 25: 


5 1 6 —i 7 2 
A= E | and A= f | and A= E |: 
Those all have det(A — AJ) = (A — 5). The algebraic multiplicity is AM = 2. But each 


A — 5I has rank r = 1. The geometric multiplicity is GM = 1. There is only one line of 
eigenvectors for A = 5, and these matrices are not diagonalizable. 
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Eigenvalues of AB and A+B 


The first guess about the eigenvalues of AB is not true. An eigenvalue A of A times an 
eigenvalue 6 of B usually does not give an eigenvalue of AB: 


False proof ABx = ABx = BAx = Bix. (14) 


It seems that $ times A is an eigenvalue. When x is an eigenvector for A and B, this 
proof is correct. The mistake is to expect that A and B automatically share the same 
eigenvector x. Usually they don’t. Eigenvectors of A are not generally eigenvectors of B. 
A and B could have all zero eigenvalues while 1 is an eigenvalue of AB: 


0 1 0 0l. J]I 0 _]|O0 1 
a= | and =| ol then 4B =|) o and A+B =li ol: 


For the same reason, the eigenvalues of A + B are generally not A + 8. Here à + f = 0 
while A + B has eigenvalues 1 and —1. (At least they add to zero.) 

The false proof suggests what is true. Suppose x really is an eigenvector for both A and 
B. Then we do have ABx = ABx and BAx = ABx. When all n eigenvectors are shared, 
we can multiply eigenvalues. The test AB = BA for shared eigenvectors is important in 
quantum mechanics—time out to mention this application of linear algebra: 


Heisenberg’s uncertainty principle In quantum mechanics, the position matrix P and 
the momentum matrix Q do not commute. In fact OP — PQ = I (these are infinite 
matrices). Then we cannot have Px = 0 at the same time as Qx = 0 (unless x = 0). 
If we knew the position exactly, we could not also know the momentum exactly. 
Problem 28 derives Heisenberg’s uncertainty principle || Px || ||Qx|| = 4||x i2. 


= REVIEW OF THE KEY IDEAS =m 


1. If A has n independent eigenvectors x1, ..., Xn, they go into the columns of S. 
A is diagonalized by S SAS =A and A=SAS™!. 

2. The powers of A are Ak = SA*S~!, The eigenvectors in S are unchanged. 

3. The eigenvalues of A* are (A;)*,..., (An)* in the matrix AF. 


4. The solution to ug}, = Au, starting from Ho is tg = Akug = SAF Sug: 


That shows Steps 1,2,3 (c’s from ST tuo, A* from A*, and x’s from S) 


306 Chapter 6. Eigenvalues and Eigenvectors 


5. A is diagonalizable if every eigenvalue has enough eigenvectors (GM = AM). 


=m WORKED EXAMPLES =" 


6.2 A The Lucas numbers are like the Fibonacci numbers except they start with 
Lı = l and Ly = 3. Following the rule Ly42 = Ly, + Ly, the next Lucas num- 
bers are 4,7, 11, 18. Show that the Lucas number L109 is 1100 + 1400, 


Note The key point is that À; +A2 = 1 and A? + 23 = 3, when the A’s are (1 + /5)/2. 
The Lucas number Ly is A4 + Af, since this is correct for Lı and L2. 


Solution ug4ı = [1) ]ux is the same as for Fibonacci, because Lk+2 = Leas + Lk 
is the same rule (with different starting values). The equation becomes a 2 by 2 system: 


Let. üg = a 


The eigenvalues and eigenvectors of A = [12] still come from A? = A + 1: 


Ai = + and xı = A A2 = As and x= P|: 


Now solve cix1 + C2X2 = u1 = (3,1). The solution is c} = A, and c2 = Az. Check: 


_ | Ag+A2 1 _ [ traceof A? ]_ [3] _ 
uziti | HT} T| traceofA |7 |1]7% 


ioo = A?°°u; tells us the Lucas numbers (L101, L100). The second components of the 
eigenvectors x; and x2 are 1, so the second component of u 10o is the answer we want: 


Lucas number Lioo = cA? + code? = a3 + 2400, 
Lucas starts faster than Fibonacci, and ends up larger by a factor near /5. 


6.2B Find the inverse and the eigenvalues and the determinant of A: 


4 -1 -i -l 
A = 5 * eye(4) — ones(4) = -i > 7” - 
-]1 -1 -i 4 


Describe an eigenvector matrix S that gives STIAS = A. 
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Solution What are the eigenvalues of the all-ones matrix ones(4)? Its rank is certainly 
1, so three eigenvalues are A = 0,0,0. Its trace is 4, so the other eigenvalue is A = 4. 
Subtract this all-ones matrix from 5/ to get our matrix A: 


Subtract the eigenvalues 4, 0, 0, 0 from 5,5, 5,5. The eigenvalues of A are 1, 5,5,5. 


The determinant of A is 125, the product of those four eigenvalues. The eigenvector for 
A = lisx = (1,1,1,1) or (c,c,c,c). The other eigenvectors are perpendicular to x 
(since A is symmetric). The nicest eigenvector matrix S is the symmetric orthogonal 
Hadamard matrix H (normalized to unit column vectors): 


1 1 1 I 
. 1} 1 —i 1 -l T -1 
Orthonormal eigenvectors S = H = 3/1 i -1 -1 |= H =H 
i -l -l l 


The eigenvalues of A7! are 1, 4,4, . The eigenvectors are not changed so A7 = 


HAH, The inverse matrix is surprisingly neat: 


AT! = : * (eye(4) + ones(4)) = 


| e 
m pd pt D 


— m N = 


1 
1 
1 
2 


me N = = 


A is a rank-one change from 57. So AT! is a rank-one change 1/5 + ones/5. 

The determinant 125 counts the “spanning trees” in a graph with 5 nodes (all edges 
included). Trees have no loops (graphs and trees are in Section 8.2). 

With 6 nodes, the matrix 6 * eye(5) — ones(5) has the five eigenvalues 1, 6, 6, 6, 6. 


Problem Set 6.2 


Questions 1-7 are about the eigenvalue and eigenvector matrices A and S. 


1 (a) Factor these two matrices into A = SAS7!: 


1 2 1 1 
a=, | and a=|} 5: 


(b) If A= SAS! then A? =( X X Jand A= A X). 


2 IfA has A, = 2 with eigenvector x; = [}] and A. = 5 with x2 = [|}}, 
use SAST! to find A. No other matrix has the same A’s and x’s. 


3 Suppose A = SAS~!. What is the eigenvalue matrix for A + 27? What is the 
eigenvector matrix? Check that A +27 =( X X Yi. 
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4 True or false: If the columns of S (eigenvectors of A) are linearly independent, then 


(a) A is invertible (b) A is diagonalizable 
(c) S is invertible (d) S is diagonalizable. 


5 If the eigenvectors of A are the columns of Z, then A is a matrix. If the eigen- 
vector matrix S is triangular, then ST! is triangular. Prove that A is also triangular. 


6 Describe all matrices $ that diagonalize this matrix A (find all eigenvectors): 


4 0 
A= li | l 
Then describe all matrices that diagonalize A71. 


7 Write down the most general matrix that has eigenvectors |} ] and [_} ]. 


Questions 8-10 are about Fibonacci and Gibonacci numbers. 


8 Diagonalize the Fibonacci matrix by completing S7 t: 


1 1] fa afa 0 
1 ol={1 ifl 0 a i 
Do the multiplication SA*S—1[}] to find its second component. This is the kth 
Fibonacci number Fy = (A* — AS) / (Ai — Aa). 
9 Suppose G;42 is the average of the two previous numbers Gg+1 and Gx: 
Gk+2 = ŁGk+1 + 3G is [e] _ | A | Ei 
Gk+1 = Gk+1 Gk+1 l 


(a) Find the eigenvalues and eigenvectors of A. 
(b) Find the limitas n —> 00 of the matrices A” = SA” STE., 
(c) If Go = 0 and G; = 1 show that the Gibonacci numbers approach Z, 


10 Prove that every third Fibonacci number in 0, 1, 1,2,3,... is even. 

Questions 11-14 are about diagonalizability. 

11 True or false: If the eigenvalues of A are 2, 2, 5 then the matrix is certainly 
(a) invertible (b) diagonalizable (c) not diagonalizable. 

12 True or false: If the only eigenvectors of A are multiples of (1, 4) then A has 


(a) noinverse (b) arepeated eigenvalue (c) no diagonalization SA ST!. 
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13 


14 


Complete these matrices so that det A = 25. Then check that A = 5 is repeated— 
the trace is 10 so the determinant of A — AJ is (A — 5)?. Find an eigenvector with 
Ax = 5x. These matrices will not be diagonalizable because there is no second line 
of eigenvectors. 


8 9 4 10 5 
a=| | and a=] i and a= | 


The matrix A = [34] is not diagonalizable because the rank of A — 37 is 
Change one entry to make A diagonalizable. Which entries could you change? 


Questions 15-19 are about powers of matrices. 


15 


16 


17 


18 


19 


20 


21 


AF = SA*S—! approaches the zero matrix as k — oo if and only if every A has 
absolute value less than . Which of these matrices has AF > 0? 


6 9 6 9 
a= 2 and n=l Ar 
(Recommended) Find A and S to diagonalize A; in Problem 15. What is the limit 


of A* as k —> 00? What is the limit of SA‘ S—!? In the columns of this limiting 
matrix you see the 


Find A and S to diagonalize Az in Problem 15, What is (Az)!°uo for these ug? 


u-[i] e eE] a mf] 


Diagonalize A and compute SA‘ S~! to prove this formula for A*: 


_f 2 -1 x _1f14+3* 1-3* 
‘=| J has A =z% 1+3% | 


Diagonalize B and compute SA* ST! to prove this formula for B*: 


_[5 1 p _[5* 5k 4k 
s=l3 ‘| has B= | 4k . 


Suppose A = SAST}. Take determinants to prove det A = det A = A12 Àn. 
This quick proof only works when A can be 


Show that trace ST = trace TS, by adding the diagonal entries of ST and TS: 


_ ja b lą r 
s=|° al and r=| Af 


Choose T as AS~!. Then SAST! has the same trace as AS~!S = A. The trace of 
A equals the trace of A = sum of the eigenvalues. 
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23 


24 


25 


26 


27 


28 


29 


30 


31 
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AB — BA = I is impossible since the left side has trace = . But find an 
elimination matrix so that A = E and B = E" give 


1 0 


ABBA =| 0 1 


| which has trace zero. 


If A = SAS, diagonalize the block matrix B = [4 ,°]. Find its eigenvalue and 
eigenvector (block) matrices. 


Consider all 4 by 4 matrices A that are diagonalized by the same fixed eigenvector 
matrix S. Show that the A’s form a subspace (cA and A; + A3 have this same S). 
What is this subspace when S = J? What is its dimension? 


Suppose A? = A. On the left side A multiplies each column of A. Which of our four 
subspaces contains eigenvectors with A = 1? Which subspace contains eigenvectors 
with A = 0? From the dimensions of those subspaces, A has a full set of independent 
eigenvectors. So a matrix with A? = A can be diagonalized. 


(Recommended) Suppose Ax = Ax. If A = 0 then x is in the nullspace. If A 4 0 
then x is in the column space. Those spaces have dimensions (n —r) +r =n. So 
why doesn’t every square matrix have n linearly independent eigenvectors? 


The eigenvalues of A are 1 and 9, and the eigenvalues of B are —1 and 9: 
5 4 4 5 
afi am fS 


Find a matrix square root of A from R = SVA S71. Why is there no real matrix 
square root of B? 


(Heisenberg’s Uncertainty Principle) AB — BA = I can happen for infinite ma- 
trices with A = AT and B = —BT. Then 


xx = xTABx —x'BAx < 2|Ax|| [Bx]. 


Explain that last step by using the Schwarz inequality. Then Heisenberg’s inequality 

‘ . I . e “as 
says that || Ax||/||x|| times || Bx ||/||x|| is at least 5. It is impossible to get the position 
error and momentum error both very small. 


If A and B have the same A’s with the same independent eigenvectors, their factor- 
izations into are the same. So A = B. 


Suppose the same S diagonalizes both A and B. They have the same eigenvectors in 
A= SAST! and B = SA,S7™. Prove that AB = BA. 


(a) If A = [2% | then the determinant of A — AJ is (A — a)(A — d). Check the 
“Cayley-Hamilton Theorem” that (A — al)(A — d I) = zero matrix. 


(b) Test the Cayley-Hamilton Theorem on Fibonacci’s A = [13]. The theorem 
predicts that A? — A — I = 0, since the polynomial det(A —AJ) is A? —A — 1. 
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32 


33 


34 


35 


36 


37 


38 


Substitute A = SAST! into the product (A — A, J)(A — AgI)---(A — Àn T) and 
explain why this produces the zero matrix. We are substituting the matrix A for the 
number A in the polynomial p(A) = det(A — AJ). The Cayley-Hamilton Theorem 
says that this product is always p(A) = zero matrix, even if A is not diagonalizable. 


Find the eigenvalues and eigenvectors and the kth power of A. For this “adjacency 
matrix” the i, j entry of A* counts the k-step paths from i to j. 


2 
1’s in A show 


111 
edges between nodes A= o o 


If A = [49] and AB = BA, show that B = [25] is also a diagonal matrix. B 
has the same eigen as A but different eigen - These diagonal matrices 
B form a two-dimensional subspace of matrix space. AB — BA = 0 gives four 
equations for the unknowns a, b, c, d— find the rank of the 4 by 4 matrix. 


The powers A* approach zero if all |A;| < 1 and they blow up if any |A,;| > 1. 
Peter Lax gives these striking examples in his book Linear Algebra: 


asia zels 3] esli af 2-[5 4 
[4104] > 107 1024 _ 7 C1024 -C 1924) < 10-78 
Find the eigenvalues À = e}? of B and C to show B* = I and C? = —1. 
Challenge Problems 
The nth power of rotation through 9 is rotation through n8: 
a= | cos 6 — sinô | _ cosnd —sinné |. 
sinô  cos@ sinnô  cosné 


Prove that neat formula by diagonalizing A = SAS~!. The eigenvectors (columns 
of S) are (1,i) and (é, 1). You need to know Euler’s formula ef? = cos @ + i sin 6. 


The transpose of A = SAST! is AT = (S~!)TAST. The eigenvectors in ATy = 
Ay are the columns of that matrix (S—1)!. They are often called left eigenvectors. 
How do you multiply matrices to find this formula for A? 


Sum of rank-1 matrices A = SAST! = ixi y] +e + AnXnYae 


The inverse of A = eye(n) + ones(n) is A~! = eye(n) + C * ones(n). Multiply 
AAT! to find that number C (depending on n). 
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6.3 Applications to Differential Equations 


Eigenvalues and eigenvectors and A = SAS! are perfect for matrix powers A*. They are 
also perfect for differential equations du/dt = Au. This section is mostly linear algebra, 
but to read it you need one fact from calculus: The derivative of e** is Ae44. The whole 
point of the section is this: To convert constant-coefficient differential equations into 
linear algebra. 

The ordinary scalar equation du/dt = u is solved by u = e’. The equation du/dt = 
4u is solved by u = e**. The solutions are exponentials! 


One equation a = Àu has the solutions u(t) = Ceh, (1) 
The number C turns up on both sides of du/dt = Au. Att = 0 the solution Ce*# 
reduces to C (because e? = 1). By choosing C = u(0), the solution that starts from 
u(0) at t = 0 is u(t) = u(O)e*. 
We just solved a 1 by 1 problem. Linear algebra moves to n by n. The unknown is 
a vector u (now boldface). It starts from the initial vector u(0), which is given. The n 
equations contain a square matrix A. We expect n exponentials e™ x in u(t). 


(2) 


These differential equations are linear. If u(t) and v(t) are solutions, so is Cu(t)+ Dv(t). 
We will need z constants like C and D to match the n components of (0). Our first job is 
to find n “pure exponential solutions” u = e^! x by using Ax = Ax. 

Notice that A is a constant matrix. In other linear equations, A changes as ¢ changes. 
In nonlinear equations, A changes as u changes. We don’t have those difficulties. 
du/dt = Au is “linear with constant coefficients”. Those and only those are the dif- 
ferential equations that we will convert directly to linear algebra. The main point will be: 


Solve linear constant coefficient equations by exponentials ety, when Ax = Ax. 


i Solution of du/dt = Au 


Our pure exponential solution will be eAt times a fixed vector x. You may guess that A 
is an eigenvalue of A, and x is the eigenvector. Substitute u(t) = eñt x into the equation 
du/dt = Au to prove you are right (the factor e** will cancel): 


All components of this special solution u = eñt x share the same e4". The solution 
grows when A > 0. It decays when A < 0. If A isa complex number, its real part decides 
growth or decay. The imaginary part œw gives oscillation e’@ like a sine wave. 
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Example 1 Solve du/dt = Au = [90 |u starting from u(0) = [4]. 


This is a vector equation for u. It contains two scalar equations for the components y and z. 
They are “coupled together” because the matrix is not diagonal: 


du d y|_10 Illy dy _ dz _ 
a7 Ae seli JE] means that nn? and a: 


The idea of eigenvectors is to combine those equations in a way that gets back to 
1 by 1 problems. The combinations y + z and y — z will do it: 


d d 
Wetaazty ad z072 5-0-2). 


The combination y + z grows like e* , because it has à = 1. The combination y — z decays 
like e~*, because it has A = —1. Here is the point: We don’t have to juggle the original 
equations du/dt = Au, looking for these special combinations. The eigenvectors and 
eigenvalues of A will do it for us. 
This matrix A has eigenvalues 1 and —1. The eigenvectors are (1, 1) and (1, —1). The 
. . At . _ _1- 
pure exponential solutions u; and w2 take the form e*’ x with A = 1 and —1: 


SL) MIE SS, OTN 


Notice: These u’s are eigenvectors. They satisfy Au, = uw, and Auz = —uz, just like x, 
and x2. The factors ef and e™* change with time. Those factors give du, /dt = u; = Au; 
and duz/dt = —u2 = Auz. We have two solutions to du/dt = Au. To find all other 
solutions, multiply those special solutions by any C and D and add: 


. nall e| 1] _ [Ce + De™ 
Complete solution u(t) = Ce jil- E =| Cet — De~ |: (5) 


With these constants C and D, we can match any starting vector u(0). Sett = 0 and 
e? = 1. The problem asked for the initial value u(0) = (4, 2): 


u(0) gives C, D clil+2lai]-= f] yields C=3 and D=1. 


With C = 3 and D = 1 in the solution (5), the initial value problem is solved. 
The same three steps that solved ug+1 = Aug now solve du/dt = Au: 


1. Write u(0) as a combination c1x1 + --- + CnXn of the eigenvectors of A. 
2. Multiply each eigenvector x; by eåit. 


3. The solution is the combination of pure solutions e?! x: 


(6) 


Not included: If two À’s are equal, with only one eigenvector, another solution is needed. 
(It will be te? x). Step 1 needs A= SAS™! to be diagonalizable: a basis of eigenvectors. 
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Example 2 Solve du/dt = Au knowing the eigenvalues A = 1,2,3 of A: 


du 1 1 1 9 
a O 2 lļ|u  startingfrom u(0)= |7 
7 10 0 3 4 


The eigenvectors are x; = (1,0,0) and x2 = (1,1,0) and x3 = (1, 1,1). 
Step 1 The vector (0) = (9,7, 4) is 2x; + 3x2 + 4x3. Thus (c1, €2, c3) = (2, 3,4). 


tx; and e?” x, and e?! x3. 


Step 2 The pure exponential solutions are e 
Step 3 The combination that starts from u(0) is u(t) = 2e’x1 + 3e? xo + 4e% x3. 


The coefficients 2,3, 4 came from solving the linear equation c1xı +¢2X2+c¢3x¥3 = u(0): 


Cy 1 1 1 2 9 
Xi X2 X3 c&h |=| 0 1 1 3}/= {7 whichis Sc =u(0). (7) 
C3 00 1 4 4 


You now have the basic idea—how to solve du/dt = Au. The rest of this section goes 
further. We solve equations that contain second derivatives, because they arise so often in 
applications. We also decide whether u(t) approaches zero or blows up or just oscillates. 

At the end comes the matrix exponential elt. Then e^t u(0) solves the equation 
du/dt = Au in the same way that A*uo solves the equation g+ = Aug. In fact 
we ask whether ug approaches u(t). Example 3 will show how “difference equations” 
help to solve differential equations. You will see real applications. 

All these steps use the 4’s and the x’s. This section solves the constant coefficient 
problems that turn into linear algebra. It clarifies these simplest but most important 


differential equations—whose solution is completely based on e””. 


Second Order Equations 


The most important equation in mechanics is my”+-by’+ky = 0. The first term is the mass 
m times the acceleration a = y”. This term ma balances the force F (Newton’s Law). 
The force includes the damping —by’ and the elastic restoring force —k y, proportional to 
distance moved. This is a second-order equation because it contains the second derivative 
y” = d*y/dt?. It is still linear with constant coefficients m, b, k. 

In a differential equations course, the method of solution is to substitute y = eñt, 
Each derivative brings down a factor À. We want y = eñt to solve the equation: 


d*y dy _ 2 At _ 
ma + br +ky=0 becomes (mà^ + bà + kje =0. (8) 
Everything depends on mA? + bÀ + k = 0. This equation for A has two roots A, and 
Az. Then the equation for y has two pure solutions yy) = e4!' and yz = e42!. Their 
combinations cy; + C2 y2 give the complete solution unless A; = Ap. 
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In a linear algebra course we expect matrices and eigenvalues. Therefore we turn the 
scalar equation (with y”) into a vector equation (first derivative only). Suppose m = 1. 
The unknown vector u has components y and y’. The equation is du/dt = Au: 


dy/dt = y' 
converts to aly (t= o l vl. (9) 
dy'/dt = —ky —by’ dt | Y —k —b y 


The first equation dy/dt = y’ is trivial (but true). The second equation connects y” to y’ 
and y. Together the equations connect u’ to u. So we solve by eigenvalues of A: 
-k 


A- -A r= =| = b- al. has determinant Ei + bA +A -k= = o. 


The equation for the A’s is the same! It is still A? + bà + k = 0, since m = 1. 
The roots A; and Az are now eigenvalues of A. The eigenvectors and the solution are 


1 1 fi 1 
xi = M Xx = H u(t) = cyerit | + cze?! H . 


The first component of u(t) has y = cye4!" + c,e42!—1he same solution as before. 
It can’t be anything else. In the second component of u(t) you see the velocity dy/dt. 
The vector problem is completely consistent with the scalar problem. 


Example 3 Motion around a circle with y” + y =0 and y = cost 


This is our master r equation with mass m = | and stiffness k = 1 and no damping dy’. 
Substitute y = e* into y” + y = 0 to reach A? + 1 = 0. The roots are à = i and 
A = —i. Then half of e't + e—!* gives the solution y = cost. 

As a first-order system, the initial values y(0) = 1, y’(0) = 0 go into u (0) = (1,0): 


du d 0 1 
U wt = — = y = Y = . 1 
sey y dt dt H | -1 0 IB Au (10) 


The eigenvalues of A are again A = į and A = —i (no surprise). A is anti-symmetric with 
eigenvectors xı = (l,i) and x2 = (l, 7i). The combination that matches u (0) = (1,0) 
is (x4 + x2). Step 2 multiplies 4 5 bye’! and e—'*. Step 3 combines the pure oscillations 
into u(t) to find y = cos £ as expected: 


l, , 
w(t) = 5e" H +e [a] - eal This is | ol 


All good. The vector u = (cost, — sint) goes around a circle (Figure 6.3). The radius is 1 
because cos? ¢ + sin? t = 1. 


To display a circle on a screen, replace y” = —y by a finite difference equation. Here 
are three choices using Y (t+At) — 2Y (t) + Y (t-At). Divide by (At)? to approximate y”. 
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Yn+1 — 2Y, + Yn-1 — 


(At)? 


Figure 6.3 shows the exact y(t) = cos t completing a circle at £ = 2x. The three difference 
methods don’t complete a perfect circle in 32 steps of length At = 27/32. 
Those pictures will be explained by eigenvalues: 


Forward |A| >1 (spiral out) Centered |A|=1 (best) Backward |/|<I1 (spiral in) 


The 2-step equations (11) reduce to 1-step systems. In the continuous case u was 
(y, y’). Now the discrete unknown is U, = (Yn, Zn) after n time steps At from Uo: 


Those are like Y’ = Z and Z’ = —Y. Eliminating Z will bring back equation (11). 
From the equation for Y,+41, subtract the same equation for Y,,. That produces Y,41 — Yn 
on the left side and Y, — Y,—-1 on the right side. Also on the right is At(Z, — Zn-1), 
which is —(At)*Y,,_; from the Z equation. This is the forward choice in equation (11). 

My question is simple. Do the points (Y,,Z,) stay on the circle Y? + Z? = 1? 
They could grow to infinity, they could decay to (0,0). The answer must be found in the 
eigenvalues of A. |A]? is 1 + (At)?, the determinant of A. Figure 6.3 shows growth! 

We are taking powers A” and not e“, so we test the magnitude |A| and not the real 
part of i. 


i |A| > 1 and (Yn, Z,) spirals ou 


SL PRION Tp TEES r e TE 


Figure 6.3: Exact u = (cost, — sin £) on a circle. Forward Euler spirals out (32 steps). 
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The backward choice in (11) will do the opposite in Figure 6.4. Notice the difference: 


Ynti = Yn + At Zn+i . 1 —At Ynti | _ Yn — 
Zn =Zn-At a Sla 1 || Zaa) za = 2" OP 


Backward 
That matrix is AT. It still has A = 1 +i7At. But now we invert it to reach U p41. 
When AT has |A| > 1, its inverse has |A| < 1. That explains why the solution spirals in 
to (0, 0) for backward differences. 


Figure 6.4: Backward differences spiral in. Leapfrog stays near the circle Y? + Z2 = 1. 


On the right side of Figure 6.4 you see 32 steps with the centered choice. The solution 
stays close to the circle (Problem 28) if At < 2. This is the leapfrog method. The second 
difference Y,41 —2Y, + Yn—1 “leaps over” the center value Y,,. 

This is the way a chemist follows the motion of molecules (molecular dynamics leads 
to giant computations). Computational science is lively because one differential equation 
can be replaced by many difference equations—some unstable, some stable, some neutral. 
Problem 30 has a fourth (good) method that stays right on the circle. 


Note Real engineering and real physics deal with systems (not just a single mass at 
one point). The unknown y is a vector. The coefficient of y” is a mass matrix M, 
not a number m. The coefficient of y is a stiffness matrix K, not a number k. The 
coefficient of y’ is a damping matrix which might be zero. 

The equation My” + Ky = f is a major part of computational mechanics. It is 
controlled by the eigenvalues of MT! K in Kx =AMx. 


Stability of 2 by 2 Matrices 


For the solution of du/dt = Au, there is a fundamental question. Does the solution 
approach u = Q ast — oo? Is the problem stable, by dissipating energy? The solutions in 
Examples 1 and 2 included e* (unstable). Stability depends on the eigenvalues of A. 

The complete solution u(t) is built from pure solutions e^% x. If the eigenvalue A is 
real, we know exactly when eît will approach zero: The number A must be negative. 
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If the eigenvalue is a complex number A = r + is, the real part r must be negative. 
When eft splits into e”! e!5*, the factor e!5* has absolute value fixed at 1: 


est = cosst +isinst has jest]? = cos? st + sin? st = 1. 


The factor e”* controls growth (r > 0 is instability) or decay (r < 0 is stability). 
The question is: Which matrices have negative eigenvalues? More accurately, when 
are the real parts of the A’s all negative? 2 by 2 matrices allow a clear answer. 


The trace T =a+d must be negative. 
The determinant D =ad —bc must be positive. 


Reason If the A’s are real and negative, their sum is negative. This is the trace T. Their 
product is positive. This is the determinant D. The argument also goes in the reverse 
direction. If D = 4, Az is positive, then A, and A have the same sign. If T = A, + Az is 
negative, that sign will be negative. We can test T and D. 

If the A’s are complex numbers, they must have the form r + is and r — is. 
Otherwise T and D will not be real. The determinant D is automatically positive, since 
(r +is)\(r —is) = r? + s?. The trace T isr + is +r—is = 2r. So a negative trace 
means that the real part r is negative and the matrix is stable. Q.E.D. 

Figure 6.5 shows the parabola T? = 4D which separates real from complex eigenval- 
ues. Solving A? — TA + D = 0 leads to VT? — 4D. This is real below the parabola and 
imaginary above it. The stable region is the upper left quarter of the figure—where the 
trace T is negative and the determinant D is positive. 


determinant D 


RET 


b 0l both Re A > 0 , i 33 stable 
1 unstable Pi 
? 0 4 
| 
: ¢* both A > 0 5 Pi unstable 
Ke Dr unstable 
har’ 0 -7 
RAMPES trace T f | neutral 


D< 0 means A; < 0 and Az > 0: unstable 


Figure 6.5: A 2 by 2 matrix is stable (u(t) — 0) when trace < 0 and det > 0. 
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The Exponential of a Matrix 


We want to write the solution u(t) in a new form e4' u(0). This gives a perfect parallel 
with A‘ uo in the previous section. First we have to say what et means, with a matrix in 
the exponent. To define e^t for matrices, we copy e* for numbers. 

The direct definition of e* is by the infinite series 1 + x + $x? + tx’? +--.. When 


you substitute any square matrix At for x, this series defines the matrix exponential e^t: 


efl mI + Att ADH RAD + 


The number that divides (Aż)” is “n factorial”. This is n! = (1)(2)---(n — 1)(n). 
The factorials after 1,2,6 are 4! = 24 and 5! = 120. They grow quickly. The series 
always converges and its derivative is always Ae^*. Therefore e4‘u(0) solves the 
differential equation with one quick formula—even if there is a shortage of eigenvectors. 

I will use this series in Example 4, to see it work with a missing eigenvector. 
It will produce te^. First let me reach Se^! S—! in the good (diagonalizable) case. 

This chapter emphasizes how to find u(t) = e4'u(0) by diagonalization. Assume A 
does have n independent eigenvectors, so it is diagonalizable. Substitute A = SAS! into 
the series for e^t. Whenever SAS~! SAS! appears, cancel S~!S in the middle: 


Use the series eAt = ] + SAS + A(SAS—2)(SAS71t) +- 
Factor out S and S7! =S [I + At + H(A +---]S7} 
Diagonalize e^ = Sedis. (15) 


That equation says: e4* equals Se4tS—!. Then A is a diagonal matrix and so is e4# 
The numbers eå:* are on its diagonal. Multiply Se4 S—1u(0) to recognize u(t): 


eit ci 


eAty(0) = SeAtS—'n(0) = | xı > Xn “ >}. (16) 
eAnt Cn 
This solution e4* u(0) is the same answer that came in equation (6) from three steps: 
1. Write u(0) = cyx1 +-+++ CnXn. Here we need n independent eigenvectors. 


2. Multiply each x; by e4/* to follow it forward in time. 


3. The best form of e^t u(0) is u(t) = cer! y, +e + ae (17) 
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Example 4 When you substitute y = eô! into y” — 2y’ + y = 0, you get an equation 
with repeated roots: 47 — 2A + 1 = 0 = (A — 1). A differential equations course would 
propose e’ and te! as two independent solutions. Here we discover why. 

Linear algebra reduces y” — 2y’ + y = 0 toa vector equation for u = (y, y’): 


d y y’ . du 0 1 
—— } = — Z = . 1 
S] Z] is 7 4M E SE (18) 


The eigenvalues of A are again A = 1,1 (with trace = 2 and detA = 1). The only 
eigenvectors .are multiples of x = (1, 1). Diagonalization is not possible, A has only one 
line of eigenvectors. So we compute eĉ from its definition as a series: 


Short series eM! = eft eA = e [I 4 (A— Di]. (19) 


The “infinite” series ends quickly because (A — I)? is the zero matrix! You can see te! 
appearing in equation (19). The first component of u(t) = e4* u(0) is our answer y(t): 


-1 1 


Hae i | 11 | ( u(0) y(t) = e* y(0) — te’ y(0) + te’ y'(0). 


Example 5 Use the infinite series to find e4’ for A = [ _? 1]. Notice that A* = F: 


Li} eta) eb] fo) 


A>, AŚ, A’, A® will repeat these four matrices. The top right corner has 1,0,—1,0 
repeating over and over. The infinite series for e4! contains t /1!, 0, —23/3!, 0. 
Then ¢ — £3 starts that top right corner, and 1 — 42? starts the top left: 


1 1 
l 1—4 +- ann 


I+ At + HAY + HADY +--- = 
2 6 —t+h3—... 1-222 4... 


On the left side is e4’. The top row of that matrix shows the series for cost and sint. 


(20) 


A is a skew-symmetric matrix (AT = —A). Its exponential e4” is an orthogonal matrix. 
The eigenvalues of A are i and —i. The eigenvalues of e^t are eit and e~'!. Three rules: 
1 e^" always has the inverse e~4". 


2 The eigenvalues of e^t are always e**. 


3 When A is skew-symmetric, e4* is orthogonal. Inverse = transpose = e~ 4". 
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Skew-symmetric matrices have pure imaginary eigenvalues like A = i0. Then e“* has 
eigenvalues e/°'. Their absolute value is 1 (neutral stability, pure oscillation, energy is 


conserved). 


Our final example has a triangular matrix A. Then the eigenvector matrix S is trian- 
gular. So are ST! and e^. You will see the two forms of the solution: a combination of 
eigenvectors and the short form e“/ (0). 


d 
Example 6 Solve M = Au = lo | u starting from u(0) = H att = 0. 


dt 0 2 
Solution The eigenvalues 1 and 2 are on the diagonal of A (since A is triangular). The 
eigenvectors are (1,0) and (1,1). The starting u(0) is xı + x2 socy = co = 1. 


Then u(t) is the same combination of pure exponentials (no te?! when A = 1, 2): 


Solution to u’ = Au u(t) = e 5 +e” Ht 


That is the clearest form. But the matrix form produces u(t) for every u(0): 


u(t) = Se^ Sn (0) is lo Jik l[o “a= [6 Ce |W. 


That last matrix is e4*. It’s not bad to see what a matrix exponential looks like (this is 
a particularly nice one), The situation is the same as for Ax = b and inverses. We don’t 
really need A`! to find x, and we don’t need e^ to solve du/dt = Au. But as quick 
formulas for the answers, A~!b and e^ u(0) are unbeatable. 


= REVIEW OF THE KEY IDEAS ~m 


1. The equation u’ = Au is linear with constant coefficients, starting from u (0). 
2. Its solution is usually a combination of exponentials, involving each A and x: 


À 


Independent eigenvectors u(t) = cye Vey eee crernt Xn- 


. The constants c1, . . ., Cn are determined by u (0) = cyx1 +--+ + CnXn = Se. 
. u(t) approaches zero (stability) if every A has negative real part. 


. The solution is always u(t) = e4 (0), with the matrix exponential e4?. 


A on Aa U 


. Equations with y” reduce to u’ = Au by combining y’ and y into u = (y, y’). 
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= WORKED EXAMPLES =u 


6.3A Solve y” + 4y’ + 3y = 0 by substituting eĉ“ and also by linear algebra. 


Solution Substituting y = e^! yields (A2 + 4A + 3)e4* = 0. That quadratic factors into 
A? +44 +3 = (A+1)(A+3) = 0. Therefore Ay = —1 and à2 = —3. The pure solutions 
are yı = e™ and yz = e7?" . The complete solution c, y1 + €2y2 approaches zero. 
To use linear algebra we set u = (y, y’). Then the vector equation is u’ = Au: 
dy/dt = y' du | 0 AC 


dy'/dt = —3y — 4y’ converts to a =| 3 4 


This A is called a “companion matrix” and its eigenvalues are again 1 and 3: 


—A l 


Same quadratic det(A — AI) = | -3 -4- 


[=27 44 43=0. 


The eigenvectors of A are (1,A,) and (1,A2). Either way, the decay in y(t) comes from 
e™ and e~. With constant coefficients, calculus goes back to algebra Ax = Ax. 


Note In linear algebra the serious danger is a shortage of eigenvectors. Our eigenvectors 
(1,41) and (1,A2) are the same if A; = Az. Then we can’t diagonalize A. In this case we 
don’t yet have two independent solutions to du/dt = Au. 

In differential equations the danger is also a repeated A. After y = et a second 
solution has to be found. It turns out to be y = tet. This “impure” solution (with an 
extra t) appears in the matrix exponential eft. Example 4 showed how. 


6.3B Find the eigenvalues and eigenvectors of A and write u(0) = (0,2/2,0) as a 
combination of the eigenvectors. Solve both equations u’ = Au andu” = Au: 


2 1 0 > —2 1 0 
a 1-2 1l1ļu and cea] 1 2 iju with #0=0. 
t 0 1-2 t 0 1-2 f 


The 1, —2, 1 diagonals make A into a second difference matrix (like a second derivative). 
u' = Au is like the heat equation 9u/dt = 7u/dx?. 
Its solution u(t) will decay (negative eigenvalues). 
u" = Au is like the wave equation °u/dt? = 97u/ax’. 
Its solution will oscillate (imaginary eigenvalues). 


Solution The eigenvalues and eigenvectors come from det(A — AJ) = 0: 


2-4 1 0 
det(A-—AZJ)=] 1 -2-A 1 | =(-2-A)[(-2-A)* -2] =0. 
0 1 -2-A 
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One eigenvalue is A = —2, when —2 — A is zero. The other factor is A? + 4A + 2, so the 


other eigenvalues (also real and negative) are À = —2 + ./2. Find the eigenvectors: 
0 1 O||x 0 l 
à = -2 (A+2Į)xx=|1 0 1 y|=|90 for x, = > 
0 1 0j]|z 0 
J2 1 O]fx 0 1 
=-2-J2 (A-AI)x=| 1 v2 1 |lyļ=l|o]| forx. =|-72 
0 1 /2||2 0 1 
-/⁄2 1 0 ][x 0 1 
à =—24+4s2 (A-AIx =|] 1 -/2 1 yļ=]0] forxz;=| 72 
0 1 -¥2|L2 0 l 


The eigenvectors are orthogonal (proved in Section 6.4 for all symmetric matrices). 
All three À; are negative. This A is negative definite and e4' decays to zero (stability). 
The starting u(0) = (0,2./2, 0) is x3 — x2. The solution is u(t) = e743! x3 — e? x3. 


Heat equation In Figure 6.6a, the temperature at the center starts at 2./2. Heat diffuses 
into the neighboring boxes and then to the outside boxes (frozen at 0°), The rate of heat 
flow between boxes is the temperature difference. From box 2, heat flows left and right at 
the rate uw, — u2 and u3 — uz. So the flow out is uw; — 2u2 + u3 in the second row of Au. 


t=0 E t=0 E 


t>0 


Figure 6.6: Heat diffuses away from box 2 (deft). Wave travels from box 2 (right). 


Wave equation d?u/dt? = Au has the same eigenvectors x. But now the eigenvalues À 


lead to oscillations e!®! x and e~?@! x. The frequencies come from w? = —A: 
2 e . . + 
gp E”) = A(e’™x) becomes = (iw)e'®Fx = 1e'®fx and &?=-—À. 


There are two square roots of —A, so we have ell y and e!! x, With three eigenvectors 
this makes six solutions to u” = Au. A combination will match the six components of u (0) 


and u’(0). Since u' = 0 in this problem, e!®! x combines with e~!®! x into 2cos wt x. 
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6.3 C Solve the four equations da/dt = 0,db/dt = a,dc/dt = 2b,dz/dt = 3c 
in that order starting from u(0) = G (0), (0), c (0), z(0)). Solve the same equations 
by the matrix exponential in u(t) = e4* u(0). 


Four equations a 00 0 Olja 

A = 0,0,0,0 djb] 11 00 O]]8 du -4 
Eigenvalueson dt|c| 102 0 ollļc Sat a 
the diagonal Zz 0 03 0 Z 


First find A”, A3, A4 and e& = I + At + 4(Art)? + 2(At)?. Why does the series stop? 
Why is it always true that (e4)(e4) = (e24)? Always eS times et is AS +t), 


Solution 1 Integrate da/dt = 0, then db/dt = a, then dc/dt = 2b and dz/dt = 3c: 


a(t)= a(0) The 4 by 4 matrix which is 
b(t)= ta(0)+ b0) multiplying a(0), b(0), c(0), d(0) 
c(t) = t7a(0)+ 2tb(0)+ c(0) to produce a(t), b(t), c(t), d(t) 


z(t) = tĉa(0) + 327b(0) + 3tc(0) + z(0) must be the same e^! as below 


Solution 2 The powers of A (strictly triangular) are all zero after A?. 


000 0 0000 0000 

_|1 000 ,_|0 00 0 3 |0 0 0 0 a 

A=lo 2 00| 4=|z2000!)| 4=5loo00| 4 =9 
0030 0600 6000 


The diagonals move down at each step. So the series for e4! stops after four terms: 


I 
2 3 
Same e4! = 1+ At + —— ae + 4d = 2 » i 


6 
t? 3? 3t l 


The square of eA is always e274 for many reasons: 
1. Solving with e4 from t = 0 to 1 and then from 1 to 2 agrees with e74 from 0 to 2. 
2. The squared series (J + A+ g +--+)? matches J + 2A + Qa? eem eA, 


3. If A can be diagonalized (this A can’t!) then (Se4S—!)(SeA S7!) = Se2A 571, 


But notice in Problem 23 that e4e8 and e e^ and e^ + B are all different. 
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Problem Set 6.3 


1 


Find two A’s and x’s so that u = e^ x solves 
du {4 3 
dt lo 1|” 


What combination u = cyertty i+ cper2t x2 starts from u(0) = (5, —2)? 
Solve Problem 1 for u = (y, z) by back substitution, z before y: 


d d 
Solve = = z from z(0) = —2. Then solve z = 4y + 3z from y(0) = 5. 


The solution for y will be a combination of e4! and et. The A’s are 4 and 1. 


(a) If every column of A adds to zero, why is A = 0 an eigenvalue? 


(b) With negative diagonal and positive off-diagonal adding to zero, u’ = Au 
will be a “continuous” Markov equation. Find the eigenvalues and eigenvec- 
tors, and the steady state ast —> co 


—2 3 


du 
Solve al 2 3 


i |a with u(0) = Hi What is u(0o)? 


A door is opened between rooms that hold v(0) = 30 people and w(0) = 10 people. 
The movement between rooms is proportional to the difference v — w: 


dv dw 

dt dt 
Show that the total v + w is constant (40 people). Find the matrix in du/dt = Au 
and its eigenvalues and eigenvectors. What are v and w att = 1 and t = œ? 


Reverse the diffusion of people in Problem 4 to du/dt = —Au: 

dv =v—-—W and dw =w-—v 

dt dt l 
The total v + w still remains constant. How are the A’s changed now that A is 
changed to —A? But show that v(t) grows to infinity from v(0) = 30. 


A has real eigenvalues but B has complex eigenvalues: 


a=| 9 ‘| a=" 5] (a and b are real) 


Find the conditions on a and b so that all solutions of du/dt = Au and 
dv/dt = Bv approach zero as t > 00. 
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7 Suppose P is the projection matrix onto the 45° line y = x in R?. What are its 
eigenvalues? If du/dt = —Pu (notice minus sign) can you find the limit of u(t) at 
t = oo starting from u(0) = (3, 1)? 


8 The rabbit population shows fast growth (from 67) but loss to wolves (from —2w). 
The wolf population always grows in this model (~w? would control wolves): 


dr dw 
— =6r-2 d — =2 . 
Ji r w an Ji r+w 


Find the eigenvalues and eigenvectors. If r(0) = w(0) = 30 what are the popula- 
tions at time ¢? After a long time, what is the ratio of rabbits to wolves? 


9 (a) Write (4,0) as a combination c1x1 + c2x2 of these two eigenvectors of A: 


o ifiy_ of oap fi 
-1 oll: ZF -1 ojl 7 Ja}: 
(b) The solution to du/dt = Au starting from (4,0) is ciet xi + eet xa. 
Substitute e'f = cost + i sint and e™** = cost — i sint to find u(t). 


Questions 10-13 reduce second-order equations to first-order systems for (y, y’). 


10 Find A to change the scalar equation y” = 5y’ + 4y into a vector equation for 


u = (y, y): 
dt y” y’ . 


What are the eigenvalues of A? Find them also by substituting y = eñt into y” = 
5y’ + 4y. 


11 The solution to y” = 0 is a straight line y = C + Dt. Convert to a matrix equation: 


£ [>] = fo | [>] has the solution [> — e^t ol: 


This matrix A has A = 0,0 and it cannot be diagonalized. Find A? and compute 
e^t = I + At + 4A?t? +--.. Multiply your e^! times (y(0), y’(0)) to check the 
straight line y(t) = y(0) + y’(O)e. 


12 Substitute y = eô* into y” = 6y’ — 9y to show that A = 3 is a repeated root. This 
is trouble; we need a second solution after et. The matrix equation is 


aily|=[s sll» 


Show that this matrix has A = 3,3 and only one line of eigenvectors. Trouble here 
too. Show that the second solution to y” = 6y’ — 9y is y = te”. 
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13 (a) Write down two familiar functions that solve the equation d*y/dt? = —9y. 
Which one starts with y(0) = 3 and y’(0) = 0? 
(b) This second-order equation y” = —9y produces a vector equation u’ = Au: 


nb] gbb olol 


Find u(t) by using the eigenvalues and eigenvectors of A: u(0) = (3, 0). 


14 The matrix in this question is skew-symmetric (AT = —A): 
du 0 c —b ul = cuz — bus 
a —-c 0 alu or Uy = au3 — Cu 
t b —a 0 us = bui — auz. 


(a) The derivative of |ju(¢)|[? = u? + u3 + u% is 2uyul + 2uzu + 2u3u3. 
Substitute u|, u4, u% to get zero. Then ||z(z)||? stays equal to ||z(0)||?. 


(b) When A is skew-symmetric, Q = e® is orthogonal. Prove QT = e74 from 
the series for Q = e4!. Then QTQ = I. 


15 A particular solution to du/dt = Au — b is up = A~'b, if A is invertible. The 
usual solutions to du/dt = Au give u,. Find the complete solution u = Up + Un: 


du du 1 0 4 
a) a =u-4 (b) ae le- fél: 
16 Ifc is not an eigenvalue of A, substitute u = e°'v and find a particular solution to 


du/dt = Au — e°'b. How does it break down when c is an eigenvalue of A? The 
“nullspace” of du/dt = Au contains the usual solutions eĉi* xi. 


17 Finda matrix A to illustrate each of the unstable regions in Figure 6.5: 
(a) Ay <OandAz>0 (b)A,>O0andAjA2.>0 ()A=axtib witha > 0. 
Questions 18-27 are about the matrix exponential eA, 


18 Write five terms of the infinite series for e“. Take the ¢ derivative of each term. 
Show that you have four terms of Ae“? Conclusion: e4! uo solves w’ = Au. 


19 The matrix B = [$—¢] has B? = 0. Find eBt from a (short) infinite series. 
Check that the derivative of e?! is Be?*. 


20 Starting from (0) the solution at time T is eAT 4 (0). Go an additional time ¢ to 
reach e^t eAT 4 (0). This solution at time t + T can also be written as 
Conclusion: e^ times e4T equals 


21 Write A = [14] in the form SAS™!. Find e4 from SeA‘S~1. 
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22 


23 


24 


25 


26 


27 


28 
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If A? = A show that the infinite series produces e4! = I + (ef —1)A. For A = [4 4] 
in Problem 21 this gives e^ = 


Generally efe? is different from e? 4, They are both different from eAtB. 
Check this using Problems 21—22 and 19. (If AB = BA, all three are the same.) 


ba se] eee f 


Write A = [41] as SAS~!. Multiply Se4‘ S7! to find the matrix exponential e4. 
Check e^" and the derivative of e4* when t = 0. 


Put A = [13] into the infinite series to find e^t, First compute A? and A’: 


“BY Deal Jeo ] 


Give two reasons why the matrix exponential e^t is never singular: 


(a) Write down its inverse. 


(b) Write down its eigenvalues. If Ax = Ax then efx = x. 


Find a solution x(t), y(t) that gets large as £ —> oo. To avoid this instability a 
scientist exchanged the two equations: 


dx/dt = Ox—A4y 
dy/dt = —2x + 2y 


dy/dt = —2x + 2y 


becomes = dx/dt = Ox —4y. 


Now the matrix [ ~2 _2] is stable. It has negative eigenvalues. How can this be? 


Challenge Problems 


Centering y” = —y in Example 3 will produce Yn41 — 2Yp + Yn-1 = —(At)?Yn. 
This can be written as a one-step difference equation for U = (Y, Z): 


Ynt+1 = Yn + At Zn 1 0 Yn+1 _ 1 At Yn 


Invert the matrix on the left side to write this as U „+1 = AU». Show that det A = 1. 
Choose the large time step At = 1 and find the eigenvalues A; and Àz = A, of A: 


A= E p [bas [Ai] = |A2| = 1. Show that A® is exactly J. 


After 6 steps to t = 6, U e equals Uo. The exact y = cost returns to 1 att = 27. 
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29 That centered choice (leapfrog method) in Problem 28 is very successful for small 
time steps At. But find the eigenvalues of A for At = J/2 and 2: 


a=] 95 | and =| 3): 


Both matrices have |A| = 1. Compute A‘* in both cases and find the eigenvectors 
of A. That value At = 2 is at the border of instability. Time steps At > 2 will lead 
to |A| > 1, and the powers in U, = A” UQ will explode. 


Note You might say that nobody would compute with At > 2. But if an atom 
vibrates with y” = —1000000y, then At > .0002 will give instability. Leapfrog has 
a very strict stability limit. Y,41 = Yn+3Z, and Zn41 = Zn—3Yn+1 will explode 
because Af = 3 is too large. 


30 Another good idea for y” = —y is the trapezoidal method (half forward/half back): 
This may be the best way to keep (Yn, Zn) exactly on a circle. 


: 1 —At/2 Yn+1 _ 1 At/2 Yn 
Trapezoidal | At/2 i | | Zn41 | = l —At/2 1 Z, | 


(a) Invert the left matrix to write this equation as Un+ı = AU, Show that A is 
an orthogonal matrix: ATA = I. These points U,, never leave the circle. 
A = (I —B)"'!(/ + B) is always an orthogonal matrix if BT = —B. 

(b) (Optional MATLAB) Take 32 steps from U ọ = (1,0) to U 32 with At = 27/32. 
Is U32 = Uo? I think there is a small error. 


31 The cosine of a matrix is defined like e4, by copying the series for cos t: 


l2, 14 _ l 2, 144 
cost = 1-5! + af eee cosd=i— 7A +74 ee 
(a) If Ax = Ax, multiply each term times x to find the eigenvalue of cos A. 


(b) Find the eigenvalues of A = [7 H with eigenvectors (1, 1) and (1, —1). 


From the eigenvalues and eigenvectors of cos A, find that matrix C = cos A. 
(c) The second derivative of cos(At) is —A? cos( At). 
2u 


d 
u(t) = cos(At)u(0) solves ap = —A’u starting from u’(0) = 0. 


Construct u(t) = cos(At) u(0) by the usual three steps for that specific A: 
1. Expand u(0) = (4,2) = cyx 1 + c2x2 in the eigenvectors. 
2. Multiply those eigenvectors by and (instead of e**). 


3. Add up the solution u(t) = c1 xı +c X2. 
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6.4 Symmetric Matrices 


For projection onto a plane in R?, the plane is full of eigenvectors (where Px = x). The 
other eigenvectors are perpendicular to the plane (where Px = 0). The eigenvalues 
A = 1,1,0 are real. Three eigenvectors can be chosen perpendicular to each other. I have 
to write “can be chosen” because the two in the plane are not automatically perpendicular. 
This section makes that best possible choice for symmetric matrices: The eigenvectors of 
P = P7 are perpendicular unit vectors. 

Now we open up to all symmetric matrices. It is no exaggeration to say that these 
are the most important matrices the world will ever see—in the theory of linear algebra 
and also in the applications. We come immediately to the key question about symmetry. 
Not only the question, but also the answer. 

What is special about Ax = Ax when A is symmetric? We are looking for special 
properties of the eigenvalues A and the eigenvectors x when A = AT. 

The diagonalization A = SAS! will reflect the symmetry of A. We get some hint by 
transposing to AT = (ST1)TA ST. Those are the same since A = AT. Possibly S~! in the 
first form equals ST in the second form. Then STS = J. That makes each eigenvector in 
S orthogonal to the other eigenvectors. The key facts get first place in the Table at the end 
of this chapter, and here they are: 


Those n orthonormal eigenvectors go into the columns of S. Every symmetric matrix can 
be diagonalized. Its eigenvector matrix S becomes an orthogonal matrix Q. Orthogonal 
matrices have Q~! = Q?—what we suspected about S is true. To remember it we write 
S = Q, when we choose orthonormal eigenvectors. 

Why do we use the word “choose”? Because the eigenvectors do not have to be unit 
vectors. Their lengths are at our disposal. We will choose unit vectors—eigenvectors of 
length one, which are ortlionormal and not just orthogonal. Then SAST! is in its special 
and particular form Q AQ" for symmetric matrices: 


(Spectral T 


Symmetrie diagonalization 4 = - oa07 = = oao" with Q7!= QT 
It is easy to see that QAQ™ is symmetric. Take its transpose. You get (Q7)TATQT, which 


is QAQ" again. The harder part is to prove that every symmetric matrix has real A’s and 
orthonormal x’s. This is the “spectral theorem” in mathematics and the “principal axis 
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theorem” in geometry and physics. We have to prove it! No choice. I will approach the 
proof in three steps: 


1. By an example, showing real A’s in A and orthonormal x’s in Q. 
2. By a proof of those facts when no eigenvalues are repeated. 


3. By a proof that allows repeated eigenvalues (at the end of this section). 


‘ , ’ — l 2 — 1-A 2 
Example 1 Find the A's and x's when A = | 5 j | and -a =| 2 wal 


Solution The determinant of A — AJ is A? — 5A. The eigenvalues are 0 and 5 (both real). 
We can see them directly: à = 0 is an eigenvalue because A is singular, and A = 5 matches 
the trace down the diagonal of A: 0 + 5 agrees with 1 + 4. 

Two eigenvectors are (2,—1) and (1,2)—orthogonal but not yet orthonormal. The 
eigenvector for A = 0 is in the nullspace of A. The eigenvector for À = 5 is in the column 
space. We ask ourselves, why are the nullspace and column space perpendicular? The 
Fundamental Theorem says that the nullspace is perpendicular to the row space—not the 
column space. But our matrix is symmetric! Its row and column spaces are the same. Its 
eigenvectors (2, —1) and (1, 2) must be (and are) perpendicular. 

These eigenvectors have length /5. Divide them by 4/5 to get unit vectors. Put those 
into the columns of S (which is Q). Then Q-!AQ is A and Q7! = QT: 


oo- h JE JSE J-E J-a 


Now comes the n by n case. The A’s are real when A = AT and Ax = Ax. 


Real Eigenvalues All the eigenvalues of a real symmetric matrix are real. © 0 000 


Proof Suppose that Ax = Ax. Until we know otherwise, À might be a complex number 
a + ib (a and b real). Its complex conjugate is à = a — ib. Similarly the components 
of x may be complex numbers, and switching the signs of their imaginary parts gives Xx. 
The good thing is that A times ¥ is always the conjugate of A times x. So we can take 
conjugates of Ax = Ax, remembering that A is real: 


Ax=Ax leadsto A¥ =AX.  Transposeto ¥'A=X'A. (1) 
Now take the dot product of the first equation with ¥ and the last equation with x: 
X"4x =X Ax andalo ¥' Ax =¥' Ax. (2) 


The left sides are the same so the right sides are equal. One equation has A, the other 
has å. They multiply X'x = |x,|? + |x2|? +--+ = length squared which is not zero. 
Therefore A must equal A, and a +ib equals a—ib. The imaginary part is b =0. Q.E.D. 
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The eigenvectors come from solving the real equation (A — AJ)x = 0. So the x’s are 
also real. The important fact is that they are perpendicular. 


Proof Suppose Ax = A,x and Ay = Azy. We are assuming here that A; Æ A2. Take 
dot products of the first equation with y and the second with x: 


Use AT = A (Ayx)'y = (Ax)'y =xTAlTy =xTAy =x" Agy. (3) 


The left side is x'A, y, the right side is x'Azy. Since A, Æ Ao, this proves that xTy = 0. 
The eigenvector x (for A) is perpendicular to the eigenvector y (for A2). 


Example 2 The eigenvectors of a 2 by 2 symmetric matrix have a special form: 


. b b do - 
Not widely known A = É i has xı = [atal and “=| P ‘|. (4) 


This is in the Problem Set. The point here is that x, is perpendicular to x2: 
x1x2 = b(A2 — c) + (Ài —a)b = bài +A2—a — c) = 0. 


This is zero because A, + Az equals the trace a + c. Thus xīx 2 = 0. Eagle eyes might 
notice the special case a = c, b = 0 when x; = x2 = 9. This case has repeated 
eigenvalues, as in A = T. It still has perpendicular eigenvectors (1, 0) and (0, 1). 

This example shows the main goal of this section—to diagonalize symmetric matrices 
A by orthogonal eigenvector matrices S = Q. Look again at the result: 


Symmetry A=SAS! becomes A=QAQ"™ with OTQ =]. 
This says that every 2 by 2 symmetric matrix looks like 
‘ pi xT 
— T_ 1 1 
A=QAQ' = E all wa xt |: (5) 
The columns xı and xz multiply the rows Ayxt and Àx! to produce A: 
Sum of rank-one matrices A=A\x ix + Ax 2X3. (6) 


This is the great factorization Q AQ", written in terms of A’s and x’s. When the symmetric 
matrix is n by z, there are n columns in Q multiplying n rows in Q7. The n products x;x} 
are projection matrices. Including the A’s, the spectral theorem A = QAQ™ for symmetric 
matrices says that A is a combination of projection matrices: 


A = àP +---+AnPn A; = eigenvalue, P; = projection onto eigenspace. 
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Complex Eigenvalues of Real Matrices 


Equation (1) went from Ax = À x to AX = AX. In the end, A and x were real. Those two 
equations were the same. But a nonsymmetric matrix can easily produce A and x that are 
complex. In this case, AX = 1X is different from Ax = Ax. It gives us a new eigenvalue 
(which is A) and a new eigenvector (which is x): 


If Ax=Ax then 


Example3 A= en sin’ | has A, = cos + i sin and Àz = cos@ — i sin 8. 
sinô cos@ 


Those eigenvalues are conjugate to each other. They are A and A. The eigenvectors 
must be x and x, because A is real: 


This is A x Ax = cos 7 — sin? l = (cos @ + i sin 8) l 
sin cos®ð ||—i —i 


This is 1 ax =| ell ; | = (e088 =i sino) | i: 


sinô cos@ 


(7) 


Those eigenvectors (1, —i)} and (1,2) are complex conjugates because A is real. 

For this rotation matrix the absolute value is |A| = 1, because cos? @ + sin? 0 = 1. 
This fact |A| = 1 holds for the eigenvalues of every orthogonal matrix. 

We apologize that a touch of complex numbers slipped in. They are unavoidable even 
when the matrix is real. Chapter 10 goes beyond complex numbers A and complex vectors 
to complex matrices A. Then you have the whole picture. 

We end with two optional discussions. 


Eigenvalues versus Pivots 


The eigenvalues of A are very different from the pivots. For eigenvalues, we solve 
det(A — AJ) = 0. For pivots, we use elimination. The only connection so far is this: 


product of pivots = determinant = product of eigenvalues. 


We are assuming a full set of pivots d1, ..., dn. There are n real eigenvalues 41,...,An. 
The d’s and A’s are not the same, but they come from the same matrix. This paragraph is 
about a hidden relation. For symmetric matrices the pivots and the eigenvalues have the 
same signs: 


The number of positive eigenvalues of A = AT equals the number of positive pivots. 
Special case: A has all A; > 0 if and only if all pivots are positive. 


That special case is an all-important fact for positive definite matrices in Section 6.5. 
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Example 4 This symmetric matrix A has one positive eigenvalue and one positive pivot: 


Matching signs A= E | has pivots 1 and —8 


3 1 eigenvalues 4 and —2. 


The signs of the pivots match the signs of the eigenvalues, one plus and one minus. 
This could be false when the matrix is not symmetric: 


PP _| 1 6 has pivots 1 and 2 
Opposite signs B = È 5 eigenvalues —1 and —2. 


The diagonal entries are a third set of numbers and we say nothing about them. 
Here is a proof that the pivots and eigenvalues have matching signs, when A = A". 


You see it best when the pivots are divided out of the rows of U. Then A is LDLT. 
The diagonal pivot matrix D goes between triangular matrices L and LT: 


1 3] J]1 O7}1 1 3 we aL T u: : 
|; IDE all llo i This is A = LDL’. It is symmetric. 


Watch the eigenvalues when L and L' move toward the identity matrix: A > D. 


The eigenvalues of LDLT are 4 and —2. The eigenvalues of JDI™ are 1 and —8 (the 
pivots!). The eigenvalues are changing, as the “3” in L moves to zero. But to change sign, 
a real eigenvalue would have to cross zero. The matrix would at that moment be singular. 
Our changing matrix always has pivots 1 and —8, so it is never singular. The signs cannot 
change, as the A’s move to the d’s. 

We repeat the proof for any A = LDLT. Move L toward J, by moving the off- 
diagonal entries to zero. The pivots are not changing and not zero. The eigenvalues A of 
LDL" change to the eigenvalues d of I DIT. Since these eigenvalues cannot cross zero as 
they move into the pivots, their signs cannot change. Q.E.D. 

This connects the two halves of applied linear algebra—pivots and eigenvalues. 


All Symmetric Matrices are Diagonalizable 


When no eigenvalues of A are repeated, the eigenvectors are sure to be independent. 
Then A can be diagonalized. But a repeated eigenvalue can produce a shortage of 
eigenvectors. This sometimes happens for nonsymmetric matrices. It never happens 
for symmetric matrices. There are always enough eigenvectors to diagonalize A = A’. 


Here is one idea for a proof. Change A slightly by a diagonal matrix diag(c,2c,...,nc). 
If c is very small, the new symmetric matrix will have no repeated eigenvalues. Then we 
know it has a full set of orthonormal eigenvectors. As c —> 0 we obtain n orthonormal 
eigenvectors of the original A—even if some eigenvalues of that A are repeated. 

Every mathematician knows that this argument is incomplete. How do we guarantee 
that the small diagonal matrix will separate the eigenvalues? (I am sure this is true.) 
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A different proof comes from a useful new factorization that applies to all matrices, 
symmetric or not. This new factorization immediately produces A = QAQT with a full 
set of real orthonormal eigenvectors when A is any symmetric matrix. 


Every square matrix factors into A=QTQ7' where T is upper triangular and 0 =07. 
If A has real eigenvalues then O and T can be chosen real: O'O =]. 


This is Schur’s Theorem. We are looking for AQ = QT. The first column q, of Q must 
be a unit eigenvector of A. Then the first columns of AQ and QT are Aq, and f11q,. But 
the other columns of Q need not be eigenvectors when T is only triangular (not diagonal). 
So use any n — 1 columns that complete g, to a matrix Q; with orthonormal columns. At 
this point only the first columns of Q and T are set, where Aq, = 1114): 


qi 11 

giago =] : Aq, © Ag, |=] 0 (8) 
—T ° 
qa 0 


Now I will argue by “induction”. Assume Schur’s factorization Ay = Q2T2Q;5' is 
possible for that matrix Az of size n — 1. Put the orthogonal (or unitary) matrix Q2 and the 
triangular T, into the final Q and T: 


_ 1 0 on se en L . 
o=o, A and r=|' z] and AQ = QT asdesired. 


Note I had to allow q, and Q; to be complex, in case A has complex eigenvalues. 
But if ti; is a real eigenvalue, then q} and Qı can stay real. The induction step keeps 
everything real when A has real eigenvalues. Induction starts with 1 by 1, no problem. 


Proof that T is the diagonal A when A is symmetric. Then we have A = OAQ™. 


Every symmetric A has real eigenvalues. Schur’s A = QTQT with QTQ = I means that 
T = OTAQ. This is a symmetric matrix (its transpose is Q'AQ). Now the key point: 
If T is triangular and also symmetric, it must be diagonal: T = A. 


This proves A = QAQ™. The matrix A = A" has n orthonormal eigenvectors. 


= REVIEW OF THE KEY IDEAS =" 


. Asymmetric matrix has real eigenvalues and perpendicular eigenvectors. 
. Diagonalization becomes A = QAQ" with an orthogonal matrix Q. 
. All symmetric matrices are diagonalizable, even with repeated eigenvalues. 


. The signs of the eigenvalues match the signs of the pivots, when A = AT. 


vA e UO N = 


. Every square matrix can be "triangularized" by A = QTQ7!. 
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= WORKED EXAMPLES m 


6.4 A What matrix A has eigenvalues A = 1,—1 and eigenvectors x; = (cos @, sin 0) 
and x2 = (— sin 0, cos @)? Which of these properties can be predicted in advance? 


A=A' A? =] det A = -1 +and—pivot A =A 


Solution All those properties can be predicted! With real eigenvalues in A and or- 
thonormal eigenvectors in Q, the matrix A = QAQT must be symmetric. The eigenvalues 
1 and —1 tell us that A? = J (since A? = 1) and A7! = A (same thing) and det A = —1. 
The two pivots are positive and negative like the eigenvalues, since A is symmetric. 

The matrix must be a reflection. Vectors in the direction of x, are unchanged by A 
(since A = 1). Vectors in the perpendicular direction are reversed (since A = —1). The 
reflection A = QAQ’ is across the “O-line”. Write c for cos 0, s for sin 8: 


aale7s]ft Opes] _ c*—s* 2cs | [cos26  sin26 
Tis cel]/O-L]l-s cl] | 2es s*—c?|~ | sin2@ —cos26 |' 


Notice that x = (1,0) goes to Ax = (cos 20, sin 20) on the 26-line. And (cos 20, sin 20) 
goes back across the @-line to x = (1,0). 


6.4B Find the eigenvalues of A3 and B4, and check the orthogonality of their first two 
eigenvectors. Graph these eigenvectors to see discrete sines and cosines: 


2-1 0 = e 
A3=| -i 2 -1 B4 = inay 
0 -1 2 


-l 1 


The —1,2,—1 pattern in both matrices is a “second difference”. Section 8.1 will explain 
how this is like a second derivative. Then Ax = Ax and Bx = Ax are like d?x/dt? = Àx. 
This has eigenvectors x = sinkt and x = coskt that are the bases for Fourier series. The 
matrices lead to “discrete sines” and “discrete cosines” that are the bases for the Discrete 
Fourier Transform. This DFT is absolutely central to all areas of digital signal processing. 
The favorite choice for JPEG in image processing has been Bg of size 8. 


Solution The eigenvalues of A3 are A = 2 — /2 and 2 and 2 + /2. (see 6.3 B). Their 
sum is 6 (the trace of A3) and their product is 4 (the determinant). The eigenvector matrix 
S gives the “Discrete Sine Transform” and the graph shows how the first two eigenvectors 
fall onto sine curves. Please draw the third eigenvector onto a third sine curve! 
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1 v2 1 
S=| /2 0 -4/2 
1 ~/2 1 


Eigenvector matrix for A3 


sin2t N 


` uf 
e 


The eigenvalues of B4 are A = 2 — J2 and 2 and 2 + v2 and 0 (the same as for 
A3, plus the zero eigenvalue). The trace is still 6, but the determinant is now zero. The 
eigenvector matrix C gives the 4-point “Discrete Cosine Transform” and the graph shows 
how the first two eigenvectors fall onto cosine curves. (Please plot the third eigenvector.) 


. : . x 3x Sx TIn 
These eigenvectors match cosines at the halfway points 3, =, Z> =. 


cu} 1 v2-1 -1 1-v2 Sa 
1 1=-v2 -1 v2-1 Jot tt 
1 —] l -1 Oz ‘OO 


Eigenvector matrix for B4 


S and C have orthogonal columns (eigenvectors of the symmetric A3 and B4). 
When we multiply a vector by S or C, that signal splits into pure frequencies—as a musi- 
cal chord separates into pure notes. This is the most useful and insightful transform in all 
of signal processing. Here is a MATLAB code to create Bg and its eigenvector matrix C: 


n=8; e=ones(n—1,1); B=2x eye(n)—diag(e, —1)—diag(e, 1); BO, 1I=1; Bin, n)=1; 
[C, A] = eig(B); 
plot(C(: ,1:4), —o’) 


Problem Set 6.4 
1 Write A as M + N, symmetric matrix plus skew-symmetric matrix: 
12 4 
A=|4 3 0|=M4N (MT = M, NT =-—N). 
8 6 5 
For any square matrix, M = Ata’ and N = add up to A. 


2 If C is symmetric prove that ATCA is also symmetric. (Transpose it.) When A is 6 
by 3, what are the shapes of C and ATCA? 
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11 


12 
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Find the eigenvalues and the unit eigenvectors of 


ste 

i 
NNN 
con 
oon 


Find an orthogonal matrix Q that diagonalizes A = | -2§]. What is A? 


Find an orthogonal matrix Q that diagonalizes this symmetric matrix: 


I1 0 2 
A= ;|0 -1 -2 
2-2 0 


Find all orthogonal matrices that diagonalize A = EF Al 


(a) Find a symmetric matrix [| } | that has a negative eigenvalue. 
(b) How do you know it must have a negative pivot? 


(c) How do you know it can’t have two negative eigenvalues? 


If A? = 0 then the eigenvalues of A must be . Give an example that has 
A # 0. But if A is symmetric, diagonalize it to prove that A must be zero. 


If A = a + ib is an eigenvalue of a real matrix A, then its conjugate A =a-—ibis 
also an eigenvalue. (If Ax = Ax then also Ax = AX.) Prove that every real 3 by 3 
matrix has at least one real eigenvalue. 


Here is a quick “proof” that the eigenvalues of all real matrices are real: 


xT 


False proof Ax =Ax gives x'Ax =Ax'x so A= is real. 


xTx 


Find the flaw in this reasoning—a hidden assumption that is not justified. You could 
test those steps on the 90° rotation matrix [0 —1; 1 0] withA =i andx = (i, 1). 


Write A and B in the form A,x 4x7 + Azx2x3 of the spectral theorem QAQ?: 
3 1 9 12 
A= Í 3] B= E | (keep |x l] = ||x2|| = 1). 


Every 2 by 2 symmetric matrix is Ayxyx!} + Agxaxd = AVP; + A2P2. Explain 
Pi + Po = x,x!}+x2x] = I from columns times rows of Q. Why is P; P2 = 0? 


What are the eigenvalues of A = [_o b |? Create a 4 by 4 skew-symmetric matrix 
(AT = —A) and verify that all its eigenvalues are imaginary. 
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(Recommended) This matrix M is skew-symmetric and also . Then all its 
eigenvalues are pure imaginary and they also have |A| = 1. (||Mx|| = ||x|| for every 
x so ||Ax|| = ||x|| for eigenvectors.) Find all four eigenvalues from the trace of M: 

O 1 1 1 
1 j-1 0-1 1 . , , 
M = Fa 1 1 0l can only have eigenvalues į or — 1. 
-Í -l 1 


Show that A (symmetric but complex) has only one line of eigenvectors: 
A= |i il is not even diagonalizable: eigenvalues À = 0, 0. 


AT = A is not such a special property for complex matrices. The good property is 
A =A (Section 10.2). Then all A’s are real and eigenvectors are orthogonal. 


Even if A is rectangular, the block matrix B = at al is symmetric: 


Az=À 
Bx =Ax is | 0 elak] whichis y y 
z Z A y = àz. 
(a) Show that —A is also an eigenvalue, with the eigenvector (y, —z). 
(b) Show that A’ Az = Az, so that A? is an eigenvalue of ATA. 
(c) If A = I (2 by 2) find ali four eigenvalues and eigenvectors of B. 


IfA= [+] in Problem 16, find all three eigenvalues and eigenvectors of B. 
Another proof that eigenvectors are perpendicular when A = A". Two steps: 


1. Suppose Ax = Ax and Ay = Oy and A # 0. Then y is in the nullspace 
and x is in the column space. They are perpendicular because . Go 
carefully—why are these subspaces orthogonal? 


2. If Ay = By, apply this argument to A — BJ. The eigenvalue of A — BJ moves 
to zero and the eigenvectors stay the same—-so they are perpendicular. 


Find the eigenvector matrix S for A and for B. Show that S doesn’t collapse at 
d = 1, even though A = 1 is repeated. Are the eigenvectors perpendicular? 


0 d 0 -d 0 1 
A=|{d 0 0 B=] 01 Q have A= 1,d,-d. 
0 0 1 00d 


Write a 2 by 2 complex matrix with A’ = A (a “Hermitian matrix”). Find A; and A2 
for your complex matrix. Adjust equations (1) and (2) to show that the eigenvalues 
of a Hermitian matrix are real. 
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True (with reason) or false (with example). “Orthonormal” is not assumed. 


(a) A matrix with real eigenvalues and eigenvectors is symmetric. 
(b) A matrix with real eigenvalues and orthogonal eigenvectors is symmetric. 
(c) The inverse of a symmetric matrix is symmetric. 
(d) The eigenvector matrix S of a symmetric matrix is symmetric. 
(A paradox for instructors) If AAT = ATA then A and A! share the same eigen- 


vectors (true). A and AT always share the same eigenvalues. Find the flaw in this 
conclusion: They must have the same S and A. Therefore A equals A’. 


(Recommended) Which of these classes of matrices do A and B belong to: 
Invertible, orthogonal, projection, permutation, diagonalizable, Markov? 


001 iyi 
A=-|0 1 0 B=-]111 
100 3/1141 


Which of these factorizations are possible for A and B: LU, OR, SAST}, QAQT? 


What number b in [78] makes A = QAQ" possible? What number makes A = 
SAS! impossible? What number makes A! impossible? 


Find all 2 by 2 matrices that are orthogonal and also symmetric. Which two numbers 
can be eigenvalues? 


This A is nearly symmetric. But its eigenvectors are far from orthogonal: 


_f1 1078 J Vw 1] na [9 
A=) 1 4 10-15 as eigenvectors |o] an [7] 


What is the angle between the eigenvectors? 


(MATLAB) Take two symmetric matrices with different eigenvectors, say A = | 1$] 
and B = [82]. Graph the eigenvalues A, (A + tB) and A2(A +1B) for —8 < t < 8. 
Peter Lax says on page 113 of Linear Algebra that A, and Az appear to be on a 
collision course at certain values of t. “Yet at the last minute they turn aside.” How 
close do they come? 


Challenge Problems 


For complex matrices, the symmetry AT = A that produces real eigenvalues changes 
tod’ = A. From det(A — AJ) = 0, find the eigenvalues of the 2 by 2 “Hermitian” 
matrix A= [4 2+i; 2-i OJ = A’. To see why eigenvalues are real when 
A’ = A, adjust equation (1) of the text to AX = ÀF. 


Transpose to x" A’ =x"X. With A’ = A, reach equation (2): A = À. 
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29 


30 


31 


32 


Normal matrices have A A = AA. For real matrices, ATA = AAT includes 
symmetric, skew-symmetric, and orthogonal. Those have real A, imaginary A, and 
|A| = 1. Other normal matrices can have any complex eigenvalues A. 


Key point: Normal matrices have n orthonormal eigenvectors. Those vectors x; 
probably will have complex components. In that complex case orthogonality means 
x; x; = 0 as Chapter 10 explains. Inner products (dot products) become x7 y. 


The test for n orthonormal columns in Q becomes 00 = I insteadof QTQ = I. 


A hasn orthonormal eigenvectors (A = QAQ) if and only if A is normal. 


(a) Start from A = OAD’ with 00 = I. Show that A A = AA’: A is normal. 


(b) Now start from A A = AA’. Schur found A = QTO for every matrix A, 
with a triangular T. For normal matrices we must show (in 3 steps) that this T 
will actually be diagonal. Then T = A. 


Step 1. Put A = QTQ" into A A = AA to find T T = TT". 
a 

0 d 
Step 3. Extend Step 2 to size n. A normal triangular T must be diagonal. 


Step 2. Suppose T = l b | has TT = TT”. Prove that b = 0. 


If Amax is the largest eigenvalue of a symmetric matrix A, no diagonal entry can be 
larger than Amax. What is the first entry ay, of A = QAQ™? Show why a11 < Amax. 


Suppose A? = —A (real antisymmetric matrix). Explain these facts about A: 


(a) x’ Ax = 0 for every real vector x. 
(b) The eigenvalues of A are pure imaginary. 
(c) The determinant of A is positive or zero (not negative). 
For (a), multiply out an example of x? Ax and watch terms cancel. Or reverse 


x™(Ax) to (Ax)'x. For (b), Az = Az leads to Z'Az = Azz = Allz||?. Part(a) 
shows that ZT Az = (x —iy)™A(x + i y) has zero real part. Then (b) helps with (c), 


If A is symmetric and all its eigenvalues are A = 2, how do you know that A must 
be 27 ? (Key point: Symmetry guarantees that A is diagonalizable. See “Proofs of 
the Spectral Theorem” on web.mit.edu/18.06.) 
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6.5 Positive Definite Matrices 


This section concentrates on symmetric matrices that have positive eigenvalues. If sym- 
metry makes a matrix important, this extra property (all A > 0) makes it truly special. 
When we say special, we don’t mean rare. Symmetric matrices with positive eigenvalues 
are at the center of all kinds of applications. They are called positive definite. 

The first problem is to recognize these matrices. You may say, just find the eigenvalues 
and test A > 0. That is exactly what we want to avoid. Calculating eigenvalues is work. 
When the A’s are needed, we can compute them. But if we just want to know that they are 
positive, there are faster ways. Here are two goals of this section: 


e To find quick tests on a symmetric matrix that guarantee positive eigenvalues. 
e To explain important applications of positive definiteness. 


The A’s are automatically real because the matrix is symmetric. 
Start with 2 by 2. When does A = [2] have Ay > O and Az > 0? 


Aare positive if and only if a>Qandac—b?>0. 


A, = | k A is not positive definite because ac — b? = 1-4 <0 
A2 = È | is positive definite because a = 1 and ac — b? = 6—4>0 


A3= E 4] is not positive definite (even with det A = +2) because a = —1 


Notice that we didn’t compute the eigenvalues 3 and —1 of A,. Positive trace 3 — 1 = 2, 
negative determinant (3)(—1) = —3. And A3 = —Ap is negative definite. The positive 
eigenvalues for Az, two negative eigenvalues for A3. 


Proof that the 2 by 2 test is passed when A, > 0 and Az > 0. Their product A1A> is 
the determinant so ac — b? > 0. Their sum is the trace soa + c > 0. Then a and c are 
both positive (if one of them is not positive, ac — b? > 0 will fail). Problem 1 reverses the 
reasoning to show that the tests guarantee A; > O and Az > 0. 

This test uses the 1 by 1 determinant a and the 2 by 2 determinant ac — b?. When A is 
3 by 3, det A > 0 is the third part of the test. The next test requires positive pivots. 


= AT are positive if and only if the pivots are positive: 


ac — b? 


a 


i So a>0 and > 0. 
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a > Q is required in both tests. So ac > b? is also required, for the determinant test and 
now the pivot. The point is to recognize that ratio as the second pivot of A: 


The first pivot is a The second pivet is 
a b a b 2 2 
— > b ac — b 
b c 0 c- bb = 


The multiplier is b/a “a oa 


This connects two big parts of linear algebra. Positive eigenvalues mean positive pivots 
and vice versa. We gave a proof for symmetric matrices of any size in the last section. The 
pivots give a quick test for A > 0, and they are a lot faster to compute than the eigenvalues. 
It is very satisfying to see pivots and determinants and eigenvalues come together in this 


course. 
1 2 1 —2 -1 2 
eli] ela} efa I 


pivots 1 and —3 pivots 1 and 2 pivots —1 and —2 
(indefinite) (positive definite) (negative definite) 


Here is a different way to look at symmetric matrices with positive eigenvalues. 


Energy-based Definition 


From Ax = Ax, multiply by xT to get xT Ax = Ax™x. The right side is a positive A times 
a positive number xTx = ||x||?. So x" Ax is positive for any eigenvector. 

The new idea is that x’ Ax is positive for all nonzero vectors x, not just the eigen- 
vectors. In many applications this number xT Ax (or $x AX) i is the energy in the system. 
The requirement of positive energy gives another definition of a positive definite matrix. 
I think this energy-based definition is the fundamental one. 

Eigenvalues and pivots are two equivalent ways to test the new requirement x' Ax > 0. 


Definition A is s positive definite if. xTAx > 0 fore everyn nonzero vector 6 AN 
sax x x al [> «| HE z ax? + -2bxy + cy? > 0. a & ' 0) 


The four entries a, b, b, c give the four parts of xT Ax. From a and c come the pure squares 
ax? and cy*. From b and b off the diagonal come the cross terms bxy and byx (the same). 
Adding those four parts gives x' Ax. This energy-based definition leads to a basic fact: 


If A and B are symmetric positive definite, so is A + B. 
Reason: xT(A + B)x is simply x' Ax +x'Bx. Those two terms are positive (for x # 0) 


so A + B is also positive definite. The pivots and eigenvalues are not easy to follow when 
matrices are added, but the energies just add. 
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xTAx also connects with our final way to recognize a positive definite matrix. 
Start with any matrix R, possibly rectangular. We know that A = RTR is square and 
symmetric. More than that, A will be positive definite when R has independent columns: 


If the columns of R are independent, then A = RTR is positive definite. 


Again eigenvalues and pivots are not easy. But the number x! Ax is the same as x R' Rx. 
That is exactly (Rx)'(Rx)—another important proof by parenthesis! That vector Rx is 
not zero when x # 0 (this is the meaning of independent columns). Then x‘ Ax is the 
positive number || Rx ||? and the matrix A is positive definite. 

Let me collect this theory together, into five equivalent statements of positive definite- 
ness. You will see how that key idea connects the whole subject of linear algebra: pivots, 
determinants, eigenvalues, and least squares (from RTR). Then come the applications. 


natrix has one of these five properties, it has them all: 


The “upper left determinants” are 1 by 1, 2 by 2,..., by n. The last one is the determinant 
of the complete matrix A. This remarkable theorem ties together the whole linear algebra 
course—at least for symmetric matrices. We believe that two examples are more helpful 
than a detailed proof (we nearly have a proof already). 


Example 1 Test these matrices A and B for positive definiteness: 


2° -1 0 2-1 b 
A=J]-1 2- and B= |-1 2 -1 
0-1 2 b-1 2 


Solution The pivots of A are 2 and 3 and $, all positive. Its upper left determinants are 2 


and 3 and 4, all positive. The eigenvalues of A are 2 — ./2 and 2 and 2 + V2, all positive. 
That completes tests 1, 2, and 3. 
We can write x" Ax as a sum of three squares. The pivots 2, 2, $ appear outside the 


squares. The multipliers -4 and -4 from elimination are inside the squares: 
Tay — 72 2 2 ee 
x" Ax = 2(xf — x1x2 + x3 — X2x3 + x4) Rewrite with squares 


= 2(x; — $x2) + 3(x2 ~ 2x3)? + 4x3). This sum is positive. 
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I have two candidates to suggest for R. Either one will show that A = RTR is positive 
definite. R can be a rectangular first difference matrix, 4 by 3, to produce those second 
differences —1,2,—1 in A: 


2-1 0) fi -1 0 oF] |) fog 

A=R'R -l 2 -1/=/]0 1-1 0 
0 -l 2 0 —1 0O =l 1l 
0 0 -=i 


The three columns of this R are independent. A is positive definite. 
Another R comes from A = LDL! (the symmetric version of A = LU), Elimination 
gives the pivots 2, 3 $ in D and the multipliers —4, 0, -2 in L. Just put JD with L. 
2 ] — 
3 
all a 
-3 1 3 
This choice of R has square roots (not so beautiful). But it is the only R that is 3 by 3 
and upper triangular. It is the “Cholesky factor” of A and it is computed by MATLAB’s 
command R = chol(A). In applications, the rectangular R is how we build A and this 
Cholesky R is how we break it apart. 
Eigenvalues give the symmetric choice R = Q VAQT. This is also successful with 
RTR = QAQ" = A. All these tests show that the —1, 2, —1 matrix A is positive definite. 


1 1 
2 
LDL™=|-} 1 =å | = (L./D)\(LVD)* = RTR. (2) 
0 1 R is the Cholesky factor 


Now turn to B, where the (1, 3) and (3, 1) entries move away from 0 to b. This b must 
not be too large! The determinant test is easiest. The 1 by 1 determinant is 2, the 2 by 2 
determinant is still 3. The 3 by 3 determinant involves b: 


det B = 4 + 2b — 2b° = (1 +b)(4— 2b) must be positive. 


At b = —1 and b = 2 we get det B = 0. Between b = —1 and b = 2 the matrix is 
positive definite. The corner entry b = 0 in the first matrix A was safely between. 


Positive Semidefinite Matrices 


Often we are at the edge of positive definiteness. The determinant is zero. The smallest 
eigenvalue is zero. The energy in its eigenvector is x'Ax = x'Ox = 0. These matrices 
on the edge are called positive semidefinite. Here are two examples (not invertible): 


12 2 -l1 -I 
A= and B = | —1 2 ~1 | are positive semidefinite. 
a4 -1 -1 2 


A has eigenvalues 5 and 0. Its upper left determinants are | and 0. Its rank is only 1. This 
matrix A factors into RTR with dependent columns in R: 


Dependent columns 1 2) |1 OF] 1 2) _ RTR 
Positive semidefinite 2 4/7 {2 O70 Of ' 


If 4 is increased by any small number, the matrix will become positive definite. 
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The cyclic B also has zero determinant (computed above when b = —1). It is singular. 
The eigenvector x = (1, 1, 1) has Bx = Oandx'Bx = 0. Vectors x in all other directions 
do give positive energy. This B can be written as RTR in many ways, but R will always 
have dependent columns, with (1, 1, 1) in its nullspace: 


Second differences A 2-1-1 1-1 0 1 0-1 
from first differences RTR | -1 2-1]/=] 0 1-1 -1 1 0 
Cyclic A from cyclic R -1 —i 2 -1 0 1 0-1 1 


Positive semidefinite matrices have all à > 0 and all x' Ax > 0. Those weak inequalities 
(= instead of >) include positive definite matrices and the singular matrices at the edge. 
First Application: The Ellipse ax? + 2bxy +cy? =1 


Think of a tilted ellipse x? Ax = 1. Its center is (0,0), as in Figure 6.7a. Tum it to line up 
with the coordinate axes (X and Y axes). That is Figure 6.7b. These two pictures show the 
geometry behind the factorization A = QAQ™! = QAQT: 


1. The tilted ellipse is associated with A. Its equation is x7 Ax = 1. 
2. The lined-up ellipse is associated with A. Its equation is X'AX = 1. 
3. The rotation matrix that lines up the ellipse is the eigenvector matrix Q. 


Example 2 Find the axes of this tilted ellipse 5x? + 8xy + 5y? = 1. 


Solution Start with the positive definite matrix that matches this equation: 


The equationis [x y] È s] H = |, The matrix is 


Figure 6.7: The tilted ellipse 5x? + 8xy + 5y? = 1. Lined up it is 9X? + Y? = 1. 
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The eigenvectors are [}] and [_1]. Divide by v/2 for unit vectors. Then A = QAQ?: 
Eigenvectors in Q 5 4], I f1 t]/9 of 1 fi f 
Eigenvalues 9 and 1 4 5} fati -14[O 1] Yeti -1| 


Now multiply by [x y | on the left and | ~ | on the right to get back to x7 Ax: 
y y 


Ty. > 2_o{xty 2 (ž — a 

x Ax = sum ofsquares 5x* + 8xy + 5y 9 CZ } +1 W . (3) 
The coefficients are not the pivots 5 and 9/5 from D, they are the eigenvalues 9 and 1 
from A. Inside these squares are the eigenvectors (1, 1)/ v2 and (1, —1)/ v2. 

The axes of the tilted ellipse point along the eigenvectors. This explains why 
A = QAO’ is called the “principal axis theorem’”—it displays the axes. Not only the 
axis directions (from the eigenvectors) but also the axis lengths (from the eigenvalues). 
To see it all, use capital letters for the new coordinates that line up the ellipse: 


x+y x—y 
— =X and 

J2 /2 
The largest value of X? is 1/9. The endpoint of the shorter axis has X = 1/3 and Y = 0. 
Notice: The bigger eigenvalue A, gives the shorter axis, of half-length 1/./A, = 1/3. 
The smaller eigenvalue A2 = 1 gives the greater length 1/./A2 = 1. 


In the xy system, the axes are along the eigenvectors of A. In the X Y system, the axes 
are along the eigenvectors of A—the coordinate axes. All comes from A = QAQ". 


Lined up =Y and 9X?7+Y?=1. 


Suppose A= QAQ" is positive definite, so.;>0. The graph of x™Ax = 1 is an ellipse: 
[x y] QAQ" * =[X Y]A x =A X? AY? = 105° 


‘The axes point along eigenvectors. The half-lengths are 1/ y /Jı and 1. [Ra | aes 
A = I gives the circle x? + y? = 1. If one eigenvalue is negative (exchange 4’s and 5’s 
in A), we don’t have an ellipse. The sum of squares becomes a difference of squares: 


9X? — Y? = 1. This indefinite matrix gives a hyperbola. For a negative definite matrix 
like A = —J, with both A’s negative, the graph of —x? — y? = 1 has no points at all. 


= REVIEW OF THE KEY IDEAS =" 


1. Positive definite matrices have positive eigenvalues and positive pivots. 


2. A quick test is given by the upper left determinants: a > 0 and ac — b? > 0. 
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3. The graph of x! Ax is then a “bowl” going up from x = 0: 
xTAx = ax? + 2bxy + cy? is positive except at (x, y) = (0,0). 
4. A = R'R is automatically positive definite if R has independent columns. 


5. The ellipse x’ Ax = 1 has its axes along the eigenvectors of A. Lengths 1/ Ji. 


= WORKED EXAMPLES u 


6.5A The great factorizations of a symmetric matrix are A = LDL" from pivots and 
multipliers, and A = QAQ" from eigenvalues and eigenvectors. Show that x’ Ax > 0 for 
all nonzero x exactly when the pivots and eigenvalues are positive. Try these n by n tests 
on pascal(6) and ones(6) and hilb(6) and other matrices in MATLAB’s gallery. 
Solution To prove x’ Ax > 0, put parentheses into x'LDL'x and x’OAQ'x: 

x’ Ax = (L'x)'D(L'x) and x™Ax = (Q'x)TA(Q'x). 


If x is nonzero, then y = LTx and z = QTx are nonzero (those matrices are invertible). 
Sox'Ax = y’Dy = z'Az becomes a sum of squares and A is shown as positive definite: 


Pivots x™Ax = y’Dy = diy?+---+dny2 > 0 
Eigenvalues x?Ax = z'Az = Mz? +e tàn? > 0 
MATLAB has a gallery of unusual matrices (type help gallery) and here are four: 

pascal(6) is positive definite because all its pivots are 1 (Worked Example 2.6 A). 
ones(6) is positive semidefinite because its eigenvalues are 0, 0, 0, 0, 0, 6. 
H=hilb(6) is positive definite even though eig(H) shows two eigenvalues very near zero. 
Hilbert matrix x’Hx = fo + x98 +--+ xes?) ds > 0, Hy =1/G +j 4+ 1). 
rand(6)+rand(6)’ can be positive definite or not. Experiments gave only 2 in 20000. 
n = 20000; p = 0; fork = 1:n, A = rand(6); p = p + all(eig(A + A’) > 0); end, p/n 


A 
BT 


Solution Multiply the first row of M by B'A7™! and subtract from the second row, to 
get a block of zeros. The Schur complement S = C — BTA™! B appears in the corner: 


I ofA B] f4 B _[A B 4 
-BTA 7 || BT c l=] o c-BtaB 7l o s (4) 


Those two blocks A and S must be positive definite. Their pivots are the pivots of M. 


6.5B When is the symmetric block matrix M = l 2 l positive definite? 


6.5. Positive Definite Matrices 349 


6.5 C Second application: Test for a minimum. Does F(x, y) have a minimum if 
oF /dx = 0 and dF /dy = 0 at the point (x, y) = (0,0)? 


Solution For f(x), the test for a minimum comes from calculus: df/dx = 0 and 
d? f/dx? > 0. Moving to two variables x and y produces a symmetric matrix H. It con- 
tains the four second derivatives of F(x, y). Positive f ” changes to positive definite H: 


383Z F/ð3x? a? F/dxdy | 


Second derivative matrix H = l 92 F /ðyðx 32 F 19 y? 


F(x, y) has a minimum if H is positive definite. Reason: H reveals the important terms 
ax? + 2bxy + cy? near (x, y) = (0,0). The second derivatives of F are 2a,2b,2b,2c! 


6.5D Find the eigenvalues of the —1,2, —1 tridiagonal n by n matrix K (my favorite). 


Solution The best way is to guess A and x. Then check Kx = Ax. Guessing could not 
work for most matrices, but special cases are a big part of mathematics (pure and applied). 

The key is hidden in a differential equation. The second difference matrix K is like a 
second derivative, and those eigenvalues are much easier to see: 


Eigenvalues Ay,A2,-..0 0 d?y _ ap YO=O gy 
a re So 5 =Ay(x) with J. 6) 
Eigenfunctions.), y2,... -dx ya) =0 o oi 

Try y = sincx. Its second derivative is y” = —c*sincx. So the eigenvalue will be 


A = —c?, provided y(x) satisfies the end point conditions y(0) = 0 = y(1). 
Certainly sin0 = O (this is where cosines are eliminated by cos0 = 1). Atx = 1, 
we need y(1) = sinc = 0. The number c must be kz, a multiple of z, and A is —c?: 


Eigenvalues à = —k?x? d? 22 
—; sinkay = —k*x* sinkrx. (6) 
Eigenfunctions y = sinkzx dx 


Now we go back to the matrix K and guess its eigenvectors. They come from sin krx 
at n points x = h,2h,...,nh, equally spaced between 0 and 1. The spacing Ax ish = 
1/(n + 1), so the (x + 1)st point comes out at (n + 1)h = 1. Multiply that sine vector s 
by K: 


Ks = As = (2—2coskih) s 


Eigenvector of K = sine vector s 
s = (sinkzh,...,sinnkah). 


(7) 


I will leave that multiplication Ks = As as a challenge problem. Notice what is important: 
1. All eigenvalues 2 — 2 cos kr h are positive and K is positive definite. 


2. The sine matrix S has orthogonal columns = eigenvectors s;,...,5, of K. 
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sin xh sin krh 
Discrete Sine Transform 


The j,k entry is sin jkah S= 


sinnith sinnkiah 


Those eigenvectors are orthogonal just like the eigenfunctions: fo sin jax sinkax dx = 0. 


Problem Set 6.5 

Problems 1-13 are about tests for positive definiteness. 

1 Suppose the 2 by 2 tests a > 0 and ac — b? > O are passed. Then c > b?/a is also 
positive. 


(i) Ay and Az have the same sign because their product A,A2 equals 


(i) That sign is positive because A, + Az equals 
Conclusion: The tests a > 0,ac — b? > 0 guarantee positive eigenvalues Aj, Ao. 


2 Which of A1, Az, A3, A4 has two positive eigenvalues? Use the test, don’t compute 
the A’s. Find an x so that x" A,x < 0, so Aj fails the test. 


5 6 -1 -2 1 10 1 10 
a= s] A= [7 5] 4s=| 19 10 | =li orl 


3 For which numbers b and c are these matrices positive definite? 


l b 2 4 c b 
aso o] asfi] aial 
With the pivots in D and multiplier in L, factor each A into LDLT., 


4 What is the quadratic f = ax? +2bxy + cy? for each of these matrices? Complete 
the square to write f as a sum of one or two squares d;(__)? + d2(_—*+). 


1 2 1 3 
a=, J and a=|} a|: 


5 Write f(x,y) = x? + 4xy + 3y? as a difference of squares and find a point (x, y) 
where f is negative. The minimum is not at (0,0) even though f has positive 
coefficients. 


6 The function f(x,y) = 2xy certainly has a saddle point and not a minimum at 
(0,0). What symmetric matrix A produces this f? What are its eigenvalues? 
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7 


10 


11 


12 


13 


Test to see if RTR is positive definite in each case: 


i l 
R= l 2 and R=]|1 2 and R= Io an 
3 21 1 2 i1 


The function f(x, y} = 3(x + 2y)? + 4y? is positive except at (0,0). What is the 
matrixin f = [x y]A[x y]"? Check that the pivots of A are 3 and 4. 


Find the 3 by 3 matrix A and its pivots, rank, eigenvalues, and determinant: 


XI 
[x1 x2 x3] A x2 | = 4(x1 — x2 + 2x3). 
X3 


Which 3 by 3 symmetric matrices A and B produce these quadratics? 


xT Ax = 2(x? + x2 + x2 — x1x2 — x2x3). Why is A positive definite? 


XT Bx = 2(x? + x3 + x3 — x1x2 — x1x3 — x2x3). Why is B semidefinite? 


Compute the three upper left determinants of A to establish positive definiteness. 
Verify that their ratios give the second and third pivots. 


22 0 
Pivots = ratios of determinants A=|2 5 3 
0 3 8 


For what numbers c and d are A and B positive definite? Test the 3 determinants: 
c 1 l 1 2 3 

A=|l c and B=]|2 d 4 

l 1 c 3 4 5 


Find a matrix with a > Ô and c > 0 and a + c > 2b that has a negative eigenvalue. 


Problems 14-20 are about applications of the tests. 


14 


15 


If A is positive definite then A`! is positive definite. Best proof: The eigenvalues 
of A`! are positive because . Second proof (only for 2 by 2): 


1 


The entries of A~! = ——~ 
e eniri 0 ac —b2 


c —b 
|- b z] pass the determinant tests 


If A and B are positive definite, their sum A + B is positive definite. Pivots and 
eigenvalues are not convenient for A + B. Better to prove xT(A + B)x > 0. Or if 
A = RTR and B = S'S, show that A+B = [r s]" | 8] with independent columns. 
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16 A positive definite matrix cannot have a zero (or even worse, a negative number) on 
its diagonal. Show that this matrix fails to have x’ Ax > 0: 


4 1 1 Xi 
[x1 x2 x3 | 1 0 2 x2 | is not positive when (x1,%2,%3)=( , , } 
12 5 X3 


17 A diagonal entry aj; of a symmetric matrix cannot be smaller than all the A’s. If it 
were, then A — a;;/ would have eigenvalues and would be positive definite. 
But A — a;; I hasa on the main diagonal. 


18 IfAx =Ax then xTAx = . If xTAx > 0, prove that A > 0. 


19 Reverse Problem 18 to show that if all à > O then xTAx > 0. We must do this 
for every nonzero x, not just the eigenvectors. So write x as a combination of the 
eigenvectors and explain why all “cross terms” are xix j = 0. Then x! Ax is 


(c1xı Hee ben Xn) (CA x1 tee +CpAnxn) = cFAyxixy Heber Ay xe Xn > 0. 


20 Give a quick reason why each of these statements is true: 


(a) Every positive definite matrix is invertible. 
(b) The only positive definite projection matrix is P = T. 
(c) A diagonal matrix with positive diagonal entries is positive definite. 


(d) A symmetric matrix with a positive determinant might not be positive definite! 
Problems 21-24 use the eigenvalues; Problems 25-27 are based on pivots. 


21 For which s and ¢ do A and B have all A > 0 (therefore positive definite)? 


s —4 —4 t 3 0 
A=|-4 s —4 and B=|3 ¢ 4 


[-4 -4 s 04 t 


22 From A = QAQ" compute the positive definite symmetric square root QA'/2.Q7 
of each matrix. Check that this square root gives R? = A: 


5 4 10 6 
a=|; 4 and a= i0 


23 You may have seen the equation for an ellipse as x?/a? + y?/b? = 1. What are a 
and b when the equation is written Ax? + A2y? = 1? The ellipse 9x? + 4y? = 1 
has axes with half-lengths a = and b = 


24 Draw the tilted ellipse x? + xy + y? = 1 and find the half-lengths of its axes from 
the eigenvalues of the corresponding matrix A. 
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25 


26 


27 


28 


29 


30 


31 
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With positive pivots in D, the factorization A = LDLT becomes LVDJ/DL'. 
(Square roots of the pivots give D = /DJ/D.) Then C = JDL yields the 
Cholesky factorization A = CTC which is “symmetrized L U”: 


3 1 
0 2 


4 8 


From c=| g 25 


| find A. From A= | | find C = chol(A). 


In the Cholesky factorization A = CTC, with CT = LJD, the square roots of the 
pivots are on the diagonal of C. Find C (upper triangular) for 


9 0 0 11 1 
A=|]0 1 2 and A=!1 2 2 
0 2 8 12 7 


The symmetric factorization A = LDL! means that x? Ax = xTLDL" x: 


[x alk J [|= I aja illo (ac -a| fo e] Hi 


The left side is ax? + 2bxy + cy”. The right side is a(x + by)? + y?. 
The second pivot completes the square! Test with a = 2, b = 4, c = 10. 


Without multiplying A = Si 0 ~sin o f sl |- cos@ sin of find 


sin 0 cos@||0 S||—sin@ cos 
(a) the determinant of A (b) the eigenvalues of A 
(c) the eigenvectors of A (d) a reason why A is symmetric positive definite. 


For Fi(x, y) = 4x4 + x?y + y? and Fo(x,y) = x? + xy — x find the second 
derivative matrices H; and H2: 


0°F/dx? d°F /dxdy 


Test for minimum, H = E 32F/ðy? 


| is positive definite 


H is positive definite so Fı is concave up (= convex). Find the minimum point 
of Fı and the saddle point of F} (look only where first derivatives are zero). 


The graph of z = x? + y? is a bowl opening upward. The graph of z = x? — y? is 
a saddle. The graph of z = —x? — y? is a bowl opening downward. What is a test 
on a,b,c for z = ax? + 2bxy + cy” to have a saddle point at (0,0)? 


Which values of c give a bowl and which c give a saddle point for the graph of 
z = 4x? + 12xy + cy?? Describe this graph at the borderline value of c. 
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Challenge Problems 


A group of nonsingular matrices includes AB and A`! if it includes A and B. 
“Products and inverses stay in the group.” Which of these are groups (as in 2.7.37)? 

Invent a “subgroup” of two of these groups (not J by itself = the smallest 
group). 


(a) Positive definite symmetric matrices A. 
(b) Orthogonal matrices Q. 

(c) All exponentials e^ of a fixed matrix A. 
(d) Matrices P with positive eigenvalues. 


(e) Matrices D with determinant 1. 


When A and B are symmetric positive definite, AB might not even be symmetric. 
But its eigenvalues are still positive. Start from ABx = Ax and take dot products 
with Bx. Then prove A > 0. 


Write down the 5 by 5 sine matrix S from Worked Example 6.5 D, containing the 
eigenvectors of K when n = 5 and A = 1/6. Multiply K times S to see the five 
positive eigenvalues. 

Their sum should equal the trace 10. Their product should be det K = 6. 


Suppose C is positive definite (so y'C y > 0 whenever y Æ 0) and A has indepen- 
dent columns (so Ax # 0 whenever x Æ 0). Apply the energy test to x 'ATCAx to 
show that ATCA is positive definite: the crucial matrix in engineering. 
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6.6 Similar Matrices 


The key step in this chapter is to diagonalize a matrix by using its eigenvectors. When S 
is the eigenvector matrix, the diagonal matrix STAS is A—the eigenvalue matrix. But 
diagonalization is not possible for every A. Some matrices have too few eigenvectors—we 
had to leave them alone. In this new section, the eigenvector matrix S remains the best 
choice when we can find it, but now we allow any invertible matrix M. 

Starting from A we go to M~!AM. This matrix may be diagonal—probably not. 
It still shares important properties of A. No matter which M we choose, the eigenvalues 
stay the same. The matrices A and M~1AM are called “similar”. A typical matrix A is 
similar to a whole family of other matrices because there are so many choices of M. 


DEFINITION. Let M be any invertible matrix. Then B = M~'AM is similarto A. 


If B = M~'AM then immediately A = MBM!. That means: If B is similar to A then 
A is similar to B. The matrix in this reverse direction is M~!—just as good as M. 

A diagonalizable matrix is similar to A. In that special case M is S. We have A = 
SAS and A = S~'AS. They certainly have the same eigenvalues! This section is 
opening up to other similar matrices B = M~!AM, by allowing all invertible M. 

The combination M~!AM appears when we change variables in a differential equa- 
tion. Start with an equation for u and set u = Mv: 


du _ Au becomes Me = AMv whichis © = M-AMv. 

dt dt dt 
The original coefficient matrix was A, the new one at the right is M~!AM. Changing u 
to v leads to a similar matrix. When M = S the new system is diagonal—the maximum in 
simplicity. Other choices of M could make the new system triangular and easier to solve. 
Since we can always go back to u, similar matrices must give the same growth or decay. 
More precisely, the eigenvalues of A and B are the same. 


(Wo change in a s) Similar matrices A and M7 1AM have the: same eigenvalues. 

if x is an eigenvector of A, then M- iyi isan eigenvector of B= = M7 LAM. 

The proof is quick, since B = M~!AM gives A= MBM7!. Suppose Ax = Ax: 
MBM~'x =Ax meansthat B(M7~'x) =A(M™x). 


The eigenvalue of B is the same A. The eigenvector has changed to M~'x. 
Two matrices can have the same repeated i, and fail to be similar—as we will see. 
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Example 1 These matrices M—! AM all have the same eigenvalues 1 and 0. 


_ {5 5). 0, oic |i 0 
The projection A = | 5 H is similar to A = § as=[9 4 


1 0 
1 2 
0 -l 
1 0 


Now choose M = | 0 0 


| The similar matrix MTIAM is È o|: 


Also choose M = | 5 5 


| The similar matrix M~!AM is |3 I 


All 2 by 2 matrices with those eigenvalues 1 and 0 are similar to each other. The 
eigenvectors change with M, the eigenvalues don’t change. 


The eigenvalues in that example are not repeated. This makes life easy. Repeated 
eigenvalues are harder. The next example has eigenvalues 0 and 0. The zero matrix shares 
those eigenvalues, but it is similar only to itself: M~'0M = 0. 


Example 2 A family of similar matrices with A = 0, 0 (repeated eigenvalue) 


O if. . 1 —I cd d? 0 0 
a=, | is similar to | i anda B =| © fa except lo AF 


These matrices B all have zero determinant (like A). They all have rank one (like A). 
One eigenvalue is zero and the trace is cd — dc = 0, so the other must be zero. I chose any 
M =[25] withad — bc = 1, and B = M'AM. 


These matrices B can’t be diagonalized. In fact A is as close to diagonal as possible. 
It is the “Jordan form” for the family of matrices B. This is the outstanding member 
(my class says “Godfather”) of the family. The Jordan form J = A is as near as we can 
come to diagonalizing these matrices, when there is only one eigenvector. In going from A 
to B = M~'AM, some things change and some don’t. Here is a table to show this. 


Not changed by M Changed by M 
Eigenvalues Eigenvectors 
Trace and determinant Nullspace 
Rank Column space 
Number of independent Row space 
eigenvectors Left nullspace 
Jordan form Singular values 


The eigenvalues don’t change for similar matrices; the eigenvectors do. The trace is 
the sum of the A’s (unchanged). The determinant is the product of the same A’s.! The 
nullspace consists of the eigenvectors for A = 0 (if any), so it can change. Its dimension 
n — r does not change! The number of eigenvectors stays the same for each A, while the 
vectors themselves are multiplied by M~!. The singular values depend on ATA, which 
definitely changes. They come in the next section. 


'The determinant is unchanged because det B = (det M—!)(det A)(det M) = det A. 
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Examples of the Jordan Form 


The Jordan form is the serious new idea here. We lead up to it with one more example of 
similar matrices: triple eigenvalue, one eigenvector. 


Example 3 This Jordan matrix J has À = 5,5,5 on its diagonal. Its only eigenvectors 
are multiples of x = (1,0,0). Algebraic multiplicity is 3, geometric multiplicity is 1: 


0 1 0 
then J—Si7=/0 0 1 has rank 2. 
00 0 


If 


Every similar matrix B = M~!JM has the same triple eigenvalue 5,5,5. Also B — 5I 
must have the same rank 2. Its nullspace has dimension 1. So every B that is similar to this 
“Jordan block” J has only one independent eigenvector M~!x. 


The transpose matrix JT has the same eigenvalues 5, 5, 5, and JT — 5/ has the same 
rank 2. Jordan’s theorem says that J" is similar to J. The matrix M that produces the 
similarity happens to be the reverse identity: 


5 0 0 1175 1 0 1 
J'’=M"JM is 15 O]= i 05 1 1 
015 1 00 5/]/1 


All blank entries are zero. An eigenvector of JT is M~!(1,0,0) = (0,0, 1). There is one 
line of eigenvectors (x1, 0,0) for J and another line (0, 0, x3) for JT. 


The key fact is that this matrix J is similar to every matrix A with eigenvalues 5, 5,5 
and one line of eigenvectors. There is an M with MIAM = J. 


Example 4 Since J is as close to diagonal as we can get, the equation du/dt = Ju 
cannot be simplified by changing variables. We must solve it as it stands: 


du 5 1 O|} {x dx/dt = 5x +y 
g=Ju=|0 5 Lily is dy/dt = 5y +z 
t 0.0 5]]z dz/dt = 5z. 


The system is triangular. We think naturally of back substitution. Solve the last equation 


and work upwards. Main point: All solutions contain e** since A = 5: 
, dz , st 
Last equation Pria 5z yields z = z(O)e 
d 
Notice te“! = =5y+z yields y= (y(0) + tz(0))e” 


d 
Notice t?°e” = =5x+y yields x = (x(0) + ty(0) + $172 (0))e”. 


The two missing eigenvectors are responsible for the te?% and t?e>! terms in y and x. 
The factors ¢ and £t? enter because A = 5 is a triple eigenvalue with one eigenvector. 
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Note Chapter 7 will explain another approach to similar matrices. Instead of changing 
variables by u = Mv, we “change the basis”. In this approach, similar matrices will 
represent the same transformation of n-dimensional space. When we choose a basis for 
R”, we get a matrix. The standard basis vectors (M = J) lead to ITAI which is A. 
Other bases lead to similar matrices B = M~!AM. 


The Jordan Form 


For every A, we want to choose M so that M~!AM is as nearly diagonal as possible. 
When A has a full set of n eigenvectors, they go into the columns of M. Then M = S. 
The matrix S~!AS is diagonal, period. This matrix A is the Jordan form of A—when A 
can be diagonalized. In the general case, eigenvectors are missing and A can’t be reached. 

Suppose A has s independent eigenvectors. Then it is similar to a matrix with s blocks. 
Each block is like J in Example 3. The eigenvalue is on the diagonal with \’s just above it. 
This block accounts for one eigenvector of A. When there are n eigenvectors and n blocks, 
they are all 1 by 1. In that case J is A. 


(Jordan form) Tf A ‘has Ay independent. eigenvectors, i itis. similar to a matrix J that has S 
Jordan. blocks: on n ifs s diagonal: Somen matrix. M puts A into Jordan form: | 


Benta i aas ee ee Ji | ; 
Jordanform = = = = M'AM = z =J OO O S 


Jordan block = Se = rr nD 
A is similar to B if they share the same Jordan form J—not otherwise, 


The Jordan form J has an off-diagonal 1 for each missing eigenvector (and the 1’s are next 
to the eigenvalues). This is the big theorem about matrix similarity. In every family of 
similar matrices, we are picking one outstanding member called J. It is nearly diagonal (or 
if possible completely diagonal). For that J, we can solve du/dt = Ju as in Example 4. 
We can take powers J* as in Problems 9-10. Every other matrix in the family has the form 
A = MJM™~. The connection through M solves du/dt = 

The point you must see is that MJM~!MJM7! = MJ*M~. That cancellation of 
MM in the middle has been used through this chapter (when M was S). We found A! 
from SA!°°S—!__by diagonalizing the matrix. Now we can’t quite diagonalize A. So we 
use MJ 10° M! instead. 
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Jordan’s Theorem is proved in my textbook Linear Algebra and Its Applications. 
Please refer to that book (or more advanced books) for the proof. The reasoning is rather 
intricate and in actual computations the Jordan form is not at all popular—its calculation 
is not stable. A slight change in A will separate the repeated eigenvalues and remove the 
off-diagonal 1’s—switching to a diagonal A. 

Proved or not, you have caught the central idea of similarity—to make A as simple as 
possible while preserving its essential properties. 


= REVIEW OF THE KEY IDEAS m 


. Bis similar to A if B = M~! AM, for some invertible matrix M. 
. Similar matrices have the same eigenvalues. Eigenvectors are multiplied by M~!. 


. If A has n independent eigenvectors then A is similar to A (take M = S). 


A Ww N m 


. Every matrix is similar to a Jordan matrix J (which has A as its diagonal part). J 
has a block for each eigenvector, and 1’s for missing eigenvectors. 


= WORKED EXAMPLES =€ 


6.6A The 4 by 4 triangular Pascal matrix A and its inverse (alternating diagonals) are 


1000 1 0 0 0 
_| 1 100 a |- 1 0 0 
A=] i210 and A =| 1 2 1 0 
133 1 -1 3 -3 1 


Check that A and AT! have the same eigenvalues. Find a diagonal matrix D with alternat- 
ing signs that gives A~! = DT!AD. This A is similar to A7}, which is unusual. 

These similar matrices must have the same Jordan form J. This J has only one block 
because the Pascal matrix has only one line of eigenvectors. 


Solution The triangular matrices A and A`! both have A = 1, 1, 1, 1 on their main 
diagonals. Choose D with alternating 1 and —1 on its diagonal. D equals D7!: 


-1 100 0 -1 
ain 1 1100 1 oy 
DVAD= ~] 121 0 -] =A. 
1 1331 1 


Check: Changing signs in rows | and 3 of A, and columns 1 and 3, produces the four 
negative entries in A~!. We are multiplying row i by (—1)' and column j by (—1)/, which 
gives the alternating diagonals in A~1. Then AD has columns with alternating signs. 
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6.6 B The best way to compute eigenvalues of a large matrix is not from solving 
det(A — AI) = 0. That high degree polynomial is a numerical disaster. 


Instead we compute similar matrices 41, A2,... that approach a triangular matrix. Then 
the eigenvalues of A (unchanged) are almost sitting on the main diagonal. 

One way is to factor A = QR by “Gram-Schmidt”. Reverse the order to A; = RQ. 
This matrix is similar to A because RO = Q7'(QR)Q. An example with c = cos @ and 
s = sin@ shows how a small off-diagonal s can be cubed in A1: 


c S . c s l cs 
a= S o | factors into 5 ffo @ |= oR. 


c+es? s? 
s3 —cs 


Ay = RQ = | 2 | has s? below the diagonal 


Another step can factor A; = Qı Rı and reverse to Az = R,Q,. This QR method is in 
Section 9.3 with a further improvement for A,;. Add cs? to its diagonal (to get zero in the 
corner) and then subtract back from A2: 


Shift and factor A; + cs? I = Qi Rı Reverse and shift back Az = RQ, — cs?I 


Shifted OR is an amazing success—just about the best way to compute eigenvalues. 


Problem Set 6.6 


1 fC = FAF and also C = G™!BG, what matrix M gives B = MIAM? 
Conclusion: If C is similar to A and also to B then 


2 If A = diag(1,3) and B = diag(3, 1) show that A and B are similar (find an M). 
3 Show that A and B are similar by finding M so that B = M71AM: 


4 If a 2 by 2 matrix A has eigenvalues 0 and 1, why is it similar to A = È 9}? 
Deduce from Problem 1 that all 2 by 2 matrices with those eigenvalues are similar. 


5 Which of these six matrices are similar? Check their eigenvalues. 


fot) [eol Lo of fr a} fio [e i} 
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6 There are sixteen 2 by 2 matrices whose entries are 0’s and 1’s. Similar matrices go 
into the same family. How many families? How many matrices (total 16) in each 
family? 


7 (a) If x is in the nullspace of A show that M~'x is in the nullspace of M~! AM. 
(b) The nullspaces of A and MIAM have the same (vectors)(basis)(dimension). 


8 Suppose Ax = Ax and Bx = Ax with the same A’s and x’s. With independent 
eigenvectors we have A = B: Why? Find A Æ B when both have eigenvalues 0, 0 
but only one line of eigenvectors (x1, 0). 


9 By direct multiplication find A? and A? and A> when 


Guess the form of A*. Set k = 0 to find A? and k = —1 to find A7!. 
Questions 10-14 are about the Jordan form. 


10 By direct multiplication, find J? and J? when 


À 1 
a 
Guess the form of J*. Set k = Oto find J?. Set k = —1 to find J~!. 


11 Solve du/dt = Ju for J in Problem 10, starting from u(0) = (5,2). Remember 
Àt 
te”. 


12 These Jordan matrices have eigenvalues 0, 0,0,0. They have two eigenvectors (one 
from each block). But the block sizes don’t match and they are not similar: 


0 
0 
0 


0 0 


For any matrix M, compare JM with MK. If they are equal show that M is not 
invertible. Then M—! JM = K is impossible: J is not similar to K. 


13 Based on Problem 12, what are the five Jordan forms when A = 0, 0, 0, 0? 
14 Prove that A? is always similar to A (we know the ’s are the same): 
1. For one Jordan block J;: Find M; so that M7 1J; M; = Jf (see Example 3). 


2. For any J with blocks J;: Build Mo from blocks so that Mj 1JMo = JT. 
3. For any A = MJM7!: Show that AT is similar to JT and so to J and to A. 
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Prove that det(A — AJ) = det(M—~!AM — AI). (You could write J = M7!M 
and factor out det MT! and det M.) Since these characteristic polynomials of A and 
M~!4AM are the same, the eigenvalues are the same (with the same multiplicities). 


Which pairs are similar? Choose a, b, c, d to prove that the other pairs aren’t: 


a b b a c d d c 
c d d c a b b a| 
True or false, with a good reason: 


(a) A symmetric matrix can’t be similar to a nonsymmetric matrix. 
(b) An invertible matrix can’t be similar to a singular matrix. 

(c) A can’t be similar to —A unless A = 0. 

(d) A can’t be similar to A + J. 


If B is invertible, prove that AB is similar to BA. They have the same eigenvalues. 


If A is 6 by 4 and B is 4 by 6, AB and BA have different sizes. But with blocks 
_ I -A|| AB O||I A 0 0 
1 _ _ — 
M rm =| All B Ale i=l pa |=% 
(a) What sizes are the four blocks (the same four sizes in each matrix)? 


(b) This equation is M~'FM = G, so F and G have the same 10 eigenvalues. 
F has the 6 eigenvalues of AB plus 4 zeros; G has the 4 eigenvalues of BA 
plus 6 zeros. AB has the same eigenvalues as BA plus zeros. 


Why are these statements all true? 


(a) If A is similar to B then A? is similar to B?. 

(b) A? and B? can be similar when A and B are not similar (try à = 0,0). 

(c) [39] is similar to [3 4]. 

(da) [23] is not similar to [3 3]. 

(e) If we exchange rows 1 and 2 of A, and then exchange columns 1 and 2, the 
eigenvalues stay the same. In this case M = 


If J is the 5 by 5 Jordan block with A = 0, find J? and count its eigenvectors and 
find its Jordan form (there will be two blocks). 


Challenge Problems 


If an n by n matrix A has all eigenvalues A = 0, prove that A” = zero matrix. 
(Maybe prove first that J” = zero matrix, by direct multiplication. Or use the Cayley- 
Hamilton Theorem?) 


For the shifted OR method in the Worked Example 6.6 B, show that A2 is similar to 
A,. No change in eigenvalues, and the A’s quickly approach a diagonal matrix. 


If A is similar to A—!, must all the eigenvalues equal 1 or —1? 
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6.7 Singular Value Decomposition (SVD) 


The Singular Value Decomposition is a highlight of linear algebra. A is any m by n ma- 
trix, square or rectangular. Its rank is r. We will diagonalize this A, but not by STIAS. 
The eigenvectors in S have three big problems: They are usually not orthogonal, there are 
not always enough eigenvectors, and Ax = Ax requires A to be square. The singular 
vectors of A solve all those problems in a perfect way. 

The price we pay is to have two sets of singular vectors, u’s and v’s. The w’s are eigen- 
vectors of AAT and the v’s are eigenvectors of ATA. Since those matrices are 
both symmetric, their eigenvectors can be chosen orthonormal. In equation (13) below, 
the simple fact that A times ATA is the same as AAT times A will lead to a remarkable 
property of these w’s and v’s: 


= “A is diagonalized” . O- AVi =01u] ÁV? = O22 2.2 AUp = Oriy ` (1) 
The singular vectors v1,...,¥, are in the row space of A. The outputs w;,...,u, are in 
the column space of A. The singular values 0),...,0, are all positive numbers. When the 


v’s and u’s go into the columns of V and U, orthogonality gives VTV = J and UTU = 1. 
The o’s go into a diagonal matrix &. 

Just as Ax; = A;x; led to the diagonalization AS = SA, the equations Av; = Oitti 
tell us column by column that AV = UZ: 


(m by n)(n byr) O71 
equals A | U1 °-U, |=] utot 7 . (2) 
(m by r)(r by r) Or 


This is the heart of the SVD, but there is more. Those v’s and u’s account for the row 
space and column space of A. We need n — r more v’s and m — r more w’s, from the 
nullspace N (A) and the left nullspace N (AT). They can be orthonormal bases for those 
two nullspaces (and then automatically orthogonal to the first r v’s and w’s). Include all 
the v’s and z’s in V and U, so these matrices become square. We still have AV = UZ. 
(m by n)(n by n) O71 

equals A | Vi Ur- Un | = | Uy -Ur + ty Co (3) 
(m by m)(m by n) r 


The new & is m by n. It is just the old r by r matrix (call that &,) with m — r new zero 
rows and n — r new zero columns. The real change is in the shapes of U and V and È. 
Still VTV = Į and UTU = J, with sizes n and m. 

V is now a square orthogonal matrix, with inverse V7! = VT. So AV = UE can 
become A = U EVT. This is the Singular Value Decomposition: 


. SVD : e sy A= UVT — EECH +e u,orv!. eee w 
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I would write the earlier “reduced SVD” from equation (2) as A = U,=,V,). 
That is equally true, without the extra zeros in &. This reduced SVD gives the same 
splitting of A into a sum of r matrices, each of rank one. 

We will see that o? = A; is an eigenvalue of ATA and also AAT. When we put the 
singular values in descending order, 0; > 02 > ...0, > 0, the splitting in equation (4) 
gives the r rank-one pieces of A in order of importance. 


Example 1. When is U EVT (singular values) the same as SA ST! (eigenvalues) ? 


Solution We need orthonormal eigenvectors in S = U. We need nonnegative eigenvalues 
in A = X, So A must be a positive semidefinite (or definite) symmetric matrix QAQ". 


Example 2. If A = xy! with unit vectors x and y, what is the SVD of A? 


Solution The reduced SVD in (2) is exactly xy", with rank r = 1. It has u} = x and 
vı = y ando; = 1. For the full SVD, complete u, = x to an orthonormal basis 
of u’s, and complete v, = y to an orthonormal basis of v’s. No new o’s. 


I will describe an application before proving that Av; = o;u;. This key equation gave 
the diagonalizations (2) and (3) and (4) of the SVD: A = U EVT. 


Image Compression 


Unusually, I am going to stop the theory and describe applications. This is the century of 
data, and often that data is stored in a matrix. A digital image is really a matrix of pixel 
values. Each little picture element or “pixel” has a gray scale number between black and 
white (it has three numbers for a color picture). The picture might have 512 = 2° pixels 
in each row and 256 = 2° pixels down each column. We have a 256 by 512 pixel matrix 
with 217 entries! To store one picture, the computer has no problem. But a CT or MR 
scan produces an image at every cross section—a ton of data. If the pictures are frames in 
a movie, 30 frames a second means 108,000 images per hour. Compression is especially 
needed for high definition digital TV, or the equipment could not keep up in real time. 


What is compression? We want to replace those 2!’ matrix entries by a smaller number, 
without losing picture quality. A simple way would be to use larger pixels—replace groups 
of four pixels by their average value. This is 4 : 1 compression. But if we carry it further, 
like 16 : 1, our image becomes “blocky”. We want to replace the mn entries by a smaller 
number, in a way that the human visual system won’t notice. 

Compression is a billion dollar problem and everyone has ideas. Later in this book I 
will describe Fourier transforms (used in jpeg) and wavelets (now in JPEG2000). Here 
we try an SVD approach: Replace the 256 by 512 pixel matrix by a matrix of rank one: 
a column times a row. If this is successful, the storage requirement becomes 256 + 512 
(add instead of multiply). The compression ratio (256)(512)/(256 + 512) is better than 
170 to 1. This is more than we hope for. We may actually use five matrices of rank one 
(so a matrix approximation of rank 5). The compression is still 34 : 1 and the crucial 
question is the picture quality. 
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Where does the SVD come in? The best rank one approximation to A is the matrix 
0 u,v!. It uses the largest singular value o1. The best rank 5 approximation includes also 
02U2v} + ---+ asusv!. The SVD puts the pieces of A in descending order. 


A library compresses a different matrix. The rows correspond to key words. Columns 
correspond to titles in the library. The entry in this word-title matrix is aj; = 1 if word 
i is in title j (otherwise aj; = 0). We normalize the columns so long titles don’t get an 
advantage. We might use a table of contents or an abstract. (Other books might share the 
title “Introduction to Linear Algebra’’.) Instead of aj; = 1, the entries of A can include the 
frequency of the search words. See Section 8.6 for the SVD in statistics. 


Once the indexing matrix is created, the search is a linear algebra problem. This giant 
matrix has to be compressed. The SVD approach gives an optimal low rank approximation, 
better for library matrices than for natural images. There is an ever-present tradeoff in the 
cost to compute the u’s and v’s. We still need a better way (with sparse matrices). 


The Bases and the SVD 


Start with a 2 by 2 matrix. Let its rank be r = 2, so A is invertible. We want v; and v2 to 
be perpendicular unit vectors. We also want Av, and Av to be perpendicular. (This is the 
tricky part. It is what makes the bases special.) Then the unit vectors wu; = Av,/||Av1|l 
and uz = Av2/||Av2|| will be orthonormal. Here is a specific example: 


Unsymmetric matrix A= E A . (5) 


No orthogonal matrix Q will make Q7! AQ diagonal. We need U~! AV. The two bases 
will be different—one basis cannot do it. The output is Av; = ciui when the input is v1. 
The “singular values” 01 and o3 are the lengths || Av || and || Av2||. 


There is a neat way to remove U and see V by itself. Multiply AT times A. 


ATA = (UE VD (U EV?) = VATSV". (7) 


UTU disappears because it equals 7. (We require ulu, = 1 = ufu and uļuz = 0.) 
Multiplying those diagonal £7 and © gives o? and of. That leaves an ordinary 


diagonalization of the crucial symmetric matrix ATA, whose eigenvalues are o? and oF: 


- Figenvalueso?,02 gg, o|o? 9 | ye 
Eigenvectors v1; v2 A a=Vl 0 of i“ (8) 


This is exactly like A = QAQT. But the symmetric matrix is not A itself. Now the 
symmetric matrix is ATA! And the columns of V are the eigenvectors of A‘ A. Last is U: 
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Compute the eigenvectors 0 and eigenvalues o? of A" A. Then each u = Avo. 
For large matrices LAPACK finds a special way to avoid multiplying A’ A in svd (A). 


_ oo = = 


Figure 6.8: U and V are rotations and reflections. È stretches circle to ellipse. 


Exampte3 Find the singular value decomposition of that matrix A = [_??]. 


Solution Compute ATA and its eigenvectors. Then make them unit vectors: 


rt, f5 3 a _ [1/72 _ [1/72 
a=|3 5] has unit eigenvectors =| and vw, = y2 . 


The eigenvalues of ATA are 8 and 2. The v’s are perpendicular, because eigenvectors of 
every symmetric matrix are perpendicular—and ATA is automatically symmetric. 
Now the w’s are quick to find, because Av is going to be in the direction of uy: 


ssl al 


Clearly Av, is the same as 2./2 u4. The first singular value is o} = 24/2. Then o? = 8. 


Av2 = E 1] ea = Fat The unit vector is uz = Hi 


Now Av3 is /2 uz and o2 = v2. Thus o2 agrees with the other eigenvalue 2 of ATA. 
Coo rao . 2 2 1 0l 2v2 1/ V2 1/72 
ugyi. _ 
AGE is E | ~ E | | Al È [V2 1/V2]° ©) 


This matrix, and every invertible 2 by 2 matrix, transforms the unit circle to an ellipse. 
You can see that in the figure, which was created by Cliff Long and Tom Hern. 


|. The unit vectoris ui = pi 
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One final point about that example. We found the ’s from the v’s. Could we find the 
u’s directly? Yes, by multiplying AAT instead of ATA: 


Use VTV = I AA’ = (UEVI(VETUD = UD". (10) 


Multiplying XET gives a? and o? as before. The u’s are eigenvectors of AAT: 
plying 1 2 


. gps r_]| 2 2||2 -1] _ 18 0 
Diagonal in this example AA’ = É ili2 ilFlo 2b 


The eigenvectors (1,0) and (0, 1) agree with uw, and u2 found earlier. Why take the first 
eigenvector to be (1,0) instead of (—1,0) or (0,1)? Because we have to follow Av, 
(I missed that in my video lecture ...). Notice that AAT has the same eigenvalues 
(8 and 2) as ATA. The singular values are /8 and /2. 


Example 4 Find the SVD of the singular matrix A = [??]. The rank is r = 1. 


Solution The row space has only one basis vector vj = (1,1)/2. The column space 
has only one basis vector u; = (2,1)//5. Then Av; = (4,2)/+/2 must equal ou. 
It does, witha, = V10. 


Avı = viður eee ---- 7 z779 
row Spate ___ 2------77777777TTT wert 
=e eet 
vi = —= -77 
A11 column space -7% 1 2 
ui = —= 
„Lf! o- ' zli] 
2 "= 5 —| weet 
-777 Ave =0 1 1 
- u = — 
nullspace 5-72 


nullspace of AT 


Figure 6.9: The SVD chooses orthonormal bases for 4 subspaces so that Av; = O;tt;. 


The SVD could stop after the row space and column space (it usually doesn’t). It is 
customary for U and V to be square. The matrices need a second column. The vector 
v2 is in the nullspace. It is perpendicular to v, in the row space. Multiply by A to get 
Avz = 0. We could say that the second singular value is o2 = 0, but singular values are 
like pivots—only the r nonzeros are counted. 


A=UxV"t 2 2] _ 1/2 1I|][V1Io o] 1 f1 1 (11) 
Full size 1 ij Ysil -2 0 0} All -ij 
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row space of A 


nullspace of A 
column space of A 


nullspace of AY 


The first columns v1,..., Ur and w1,...,u, are eigenvectors of ATA and AAT. We 
now explain why Av; falls in the direction of u;. The last v’s and w’s (in the nullspaces) 
are easier. As long as those are orthonormal, the SVD will be correct. 

Proof of the SVD: Start from A’ Av; = 07 1;, which gives the v’s and o’s. Multiplying 
by v7 leads to || Av; ||?. To prove that Av; = o;4;, the key step is to multiply by A: 


T 


v} ATAY; = ofu? vi gives ||Av; |? = o? so that j|/Av; |] = c; (12) 


AATAv; =07Av; gives uj = Av;/o; asa unit eigenvector of AAT. (13) 


Equation (12) used the small trick of placing parentheses in (vT AT)(Av;) = ||Av;||?. 
Equation (13) placed the all-important parentheses in (4A')(Av;). This shows that Av; 
is an eigenvector of AAT. Divide by its length g; to get the unit vector u; = Av;/o;. 


These w’s are orthogonal because (Av;)"(Av;) = vj (ATAv;) = v} (0?v;) = 0. 

I will give my opinion directly. The SVD is the climax of this linear algebra course. 
I think of it as the final step in the Fundamental Theorem. First come the dimensions of 
the four subspaces. Then their orthogonality. Then the orthonormal bases diagonalize A. 
It is all in the formula A = U EVT. You have made it to the top. 


Eigshow (Part 2) 


Section 6.1 described the MATLAB demo called eigshow. The first option is eig, when x 
moves in a circle and Ax follows on an ellipse. The second option is svd, when two vectors 
x and y stay perpendicular as they travel around a circle. Then Ax and Ay move too 
(not usually perpendicular). The four vectors are in the Java demo on web.mit.edu/18.06. 

The SVD is seen graphically when Ax is perpendicular to Ay. Their directions at that 
moment give an orthonormal basis #1, u2. Their lengths give the singular values o1, 02. 
The vectors x and y at that same moment are the orthonormal basis v1, v2. 


Searching the Web 


I will end with an application of the SVD to web search engines. When you google a word, 
you get a list of web sites in order of importance. You could try “four subspaces”. 

The HITS algorithm that we describe is one way to produce that ranked list. It begins 
with about 200 sites found from an index of key words, and after that we look only at links 
between pages. Search engines are link-based more than content-based. 

Start with the 200 sites and all sites that link to them and all sites they link to. That is 
our list, to be put in order. Importance can be measured by links out and links in. 
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1. The site is an authority: links come in from many sites. Especially from hubs. 


2. The site is a hub: links go out to many sites in the list. Especially to authorities. 


We want numbers x1,...,Xy to rank the authorities and yj,..., yw to rank the hubs. 
Start with a simple count: x? and y® count the links into and out of site i. 

Here is the point: A good authority has links from important sites (like hubs). Links 
from universities count more heavily than links from friends. A good hub is linked to 
important sites (like authorities). A link to amazon.com unfortunately means more than a 
link to wellesleycambridge.com. The rankings x° and y® from counting links are updated 
to x! and y! by taking account of good links (measuring their quality by x? and y’): 


Authority values x} = > yy Hub values yi = yx? (14) 


J links toi i links to j 


In matrix language those are x! = AT y? and yt = Ax’. The matrix A contains 1’s and 0’s, 
with a;; = 1 when? links to j. In the language of graphs, A is an “adjacency matrix” 
for the World Wide Web (an enormous matrix). The new x! and y! give better rankings, 
but not the best. Take another step like (14), to reach x? and y?: 


ATA and AA‘ appear x? = Aly! = ATAx® and y? =Alx!=AATy®. (15) 


In two steps we are multiplying by ATA and AAT. Twenty steps will multiply by (A™A)!° 
and (A A‘)!°, When we take powers, the largest eigenvalue o? begins to dominate. And 
the vectors x and y line up with the leading eigenvectors vı and uy of ATA and AA". 
We are computing the top terms in the SVD, by the power method that is discussed in 
Section 9.3. It is wonderful that linear algebra helps to understand the Web. 

Google actually creates rankings by a random walk that follows web links. The more 
often this random walk goes to a site, the higher the ranking. The frequency of visits 
gives the leading eigenvector (A = 1) of the normalized adjacency matrix for the Web. 
That Markov matrix has 2.7 billion rows and columns, from 2.7 billion web sites. 

This is the largest eigenvalue problem ever solved. The excellent book by Langville and 
Meyer, Google’s PageRank and Beyond, explains in detail the science of search engines. 
See mathworks.com/company/newsletter/clevescorner/oct02_cleve.shtml 

But many of the important techniques are well-kept secrets of Google. Probably 
Google starts with last month’s eigenvector as a first approximation, and runs the random 
walk very fast. To get a high ranking, you want a lot of links from important sites. 
The HITS algorithm is described in the 1999 Scientific American (June 16). But I don’t 
think the SVD is mentioned there. . . 


= REVIEW OF THE KEY IDEAS ® 


1. The SVD factors A into U SV", with r singular values o] >... > o, > 0. 
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2. The numbers o?,. . ., 7 are the nonzero eigenvalues of AAT and ATA. 
3. The orthonormal columns of U and V are eigenvectors of AAT and ATA. 
4. Those columns hold orthonormal bases for the four fundamental subspaces of A. 


5. Those bases diagonalize the matrix: Av; = o;u; fori < r. Thisis AV = UZ. 


= WORKED EXAMPLES = 


6.7 A Identify by name these decompositions A = c,h; +---+¢,5, of an m by n matrix. 
Each term is a rank one matrix (column c times row b). The rank of A isr. 


1. Orthogonal columns ¢j,...,¢c, and  orthogonalrows 6,,...,5,. 
2. Orthogonal columns ¢j,...,¢c, and ¢riangularrows  6y,,...,6,. 
3. Triangular columns c¢j,...,¢c; and triangularrows  0j,...,6,. 


A = CB is (m by r)(r by n). Triangular vectors c; and b; have zeros up to component i. 
The matrix C with columns c; is lower triangular, the matrix B with rows b; is upper 
triangular. Where do the rank and the pivots and singular values come into this picture? 


Soiution These three splittings A = CB are basic to linear algebra, pure or applied: 
1. Singular Value Decomposition A = U XV" (orthogonal U , orthogonal XV") 
2. Gram-Schmidt Orthogonalization A = QR (orthogonal Q, triangular R) 
3. Gaussian Elimination A = LU (triangular L, triangular U) 

You might prefer to separate out the o; and pivots d; and heights h;: 
1. A= UV! with unit vectors in U and V. The singular values are in X. 
2. A = QHR with unit vectors in Q and diagonal 1’s in R. The heights 4; are in H. 
3. A = LDU with diagonal 1’s in L and U. The pivots are in D. 


Each h; tells the height of column i above the base from earlier columns. The volume 
of the full n-dimensional box (r = m = n) comes from A = UXV! = LDU = QHR: 


| det A | = | product of o’s | = | product of d’s | = | product of h’s |. 


6.7.B For A= xy! of rank one (2 by 2), compare A = U EVT with A = SAS™!. 


Comment This started as an exam problem in 2007. It led further and became 
interesting. Now there is an essay called “The Four Fundamental Subspaces: 4 Lines” 
on web.mit.edu/18.06. The Jordan form enters when yTx = 0 and A = 0 is repeated. 
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6.7.C Show that o1 > [Almax. The largest singular value dominates all eigenvalues. 
Show that o1 > |a;j |max- The largest singular value dominates all entries of A. 


Solution Start from A = U EVT, Remember that multiplying by an orthogonal matrix 
does not change length: \|Ox|| = ||x|| because (Ox? = x'TOTOx = xTx = |x]. 
This applies to Q = U and O = VT. In between is the diagonal matrix X. 


[Ax] = |JULV x || = |LV7 x] < o||V" xl] = alx. (16) 


An eigenvector has {| Ax|| = [A|||x ||. So (16) says that |A|||x |] < o;|[x ||. Then |A| < 04. 
Apply also to the unit vector x = (1,0,...,0). Now Ax is the first column of A. Then 
by inequality (16), this column has length < o1. Every entry must have magnitude < oj. 


Example 5 Estimate the singular values a, and o2 of A and A7!: 


. B _}{1 0 -1 1 0 
Eigenvalues = 1 a=|¢ 1 and A =| è |: (17) 


Solution The length of the first column is V1 + C? < c4, from the reasoning above. 
This confirms that o} > 1 and o > C. Then o, dominates the eigenvalues 1, 1 and the 
entry C. If C is very large then o; is much bigger than the eigenvalues. 


This matrix A has determinant = 1. ATA also has determinant = 1 and then o,0> = 1. 
For this matrix, 0; > 1 anda; > C lead to o2 < l and o2 < 1/C. 


Conclusion: If C = 1000 then o1 > 1000 and oy < 1/1000. A is ill-conditioned, 
slightly sick. Inverting A is easy by algebra, but solving Ax = b by elimination could be 
dangerous. A is close to a singular matrix even though both eigenvalues are À = 1. By 
slightly changing the 1, 2 entry from zero to 1/C = 1/1000, the matrix becomes singular. 


Section 9.2 will explain how the ratio Omax/Omin governs the roundoff error in 
elimination. MATLAB warns you if this “condition number” is large. Here 01/02 > C?. 


Problem Set 6.7 
Problems 1-3 compute the SVD of a square singular matrix A. 
1 Find the eigenvalues and unit eigenvectors v1, v2 of ATA. Then find u; = Av; /o: 
jl 2 rt, |10 20 r_] 5 15 
A= E | and A` A = E 20 and AA‘ = È Al 


Verify that u1 is a unit eigenvector of AAT. Complete the matrices U, E, V. 


SVD E é|=[" u J|” o | v |. 
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2 Write down orthonormal bases for the four fundamental subspaces of this A. 


3 (a) Why is the trace of ATA equal to the sum of all a?, ? 


(b) For every rank-one matrix, why is of = sum of all a?,? 


Problems 4-7 ask for the SVD of matrices of rank 2. 


4 Find the eigenvalues and unit eigenvectors of ATA and AAT. Keep each Av = on: 


Fibonacci matrix A= l i k | 


Construct the singular value decomposition and verify that A equals U XV". 
5 Use the svd part of the MATLAB demo eigshow to find those v’s graphically. 


6 Compute ATA and AA! and their eigenvalues and unit eigenvectors for V and U. 


Rectangular matrix A= | k i A |. 


Check AV = UX (this will decide + signs in U). & has the same shape as A. 


7 What is the closest rank-one approximation to that 2 by 3 matrix? 


8 A square invertible matrix has A~! = VET!UT. This says that the singular values 
of AW} are 1/a(A). Show that Omax(A7!) Omax(A) > 1. 


9 Suppose #1,...,4, and v1,...,U, are orthonormal bases for R”. ‘Construct the 
matrix A that transforms each v; into u j to give Av, = U1, ..., AUn = Up. 


10 Construct the matrix with rank one that has Av = 12u for v = $(1, 1,1, 1) and 
u = (2, 2,1). Its only singular value is o1 = 


1i Suppose A has orthogonal columns w1, wW2,..., Wn, of lengths o),02,...,0p. 
What are U, È, and V in the SVD? 


12 Suppose A is a 2 by 2 symmetric matrix with unit eigenvectors uw; and uz. If its 
eigenvalues are Ay = 3 and A2 = —2, what are the matrices U, £, VT in its SVD? 


13 If A = QR with an orthogonal matrix Q, the SVD of A is almost the same as the 
SVD of R. Which of the three matrices U, ©, V is changed because of Q? 


14 Suppose A is invertible (with oi > o2 > 0). Change A by as small a matrix as 
possible to produce a singular matrix Ag. Hint: U and V do not change: 


T 
From A= | U; uz jl 01 o> ll Vi v2 | find the nearest Ag. 
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15 


16 


17 


Why doesn’t the SVD for A + J just use © + 1? 


Challenge Problems 


(Search engine) Run a random walk x(2),..., x (7) starting from web site x(1) = 1. 
Count the visits to each site. At each step the code chooses the next website x(k) 
with probabilities given by column x(k — 1) of A. At the end, p gives the fraction 
of time at each site from a histogram: count visits. The rankings are based on p. 


Please compare p to the steady state eigenvector of the Markov matrix A: 
A=[0 .1 2 .7; 05 0 .15 8; 15 25 0 6; .1 3 6 0)’ 


n= 100; x =zeros(1,n); x(1) = 1; 
fork =2:n x(k) = min(find@rand<cumsum(A(:, x(k — 1))))); end 
p =hist(x,1:4)/n 


The 1,—1 first difference matrix A has ATA = second difference matrix. 
The singular vectors of A are sine vectors v and cosine vectors u. Then Av = ou is 
the discrete form of d/dx(sincx) = c(coscx). This is the best SVD I have seen. 


2-1 0 
ATA=| -1 2 -i 
0-1 2 


1 0 90 

-1 1 0 
SVD of A A= 0-1 4 
0 0 -i 
sinz/4 sin2r/4 sin3x/4 
Orthogonal sine matrix V = —=| sin27/4 sin4x/4 sin6/4 
V2 sin3z/4 sin67/4 sin97/4 


(a) Put numbers in V: The unit eigenvectors of ATA are singular vectors of A. 
Show that the columns of V have ATAv = Av with À = 2 — V2, 2,24 V2. 


(b) Multiply AV and verify that its columns are orthogonal. They are oyu, and 
Ozu and o3u3. The first columns of the cosine matrix U are U1, U2, u3. 


(c) Since A is 4 by 3, we need a fourth orthogonal vector u4. It comes from the 
nullspace of AT. What is u4? 


The cosine vectors in U are eigenvectors of AAT. The fourth cosine is (1, 1, 1, 1)/2. 


1 -i 0 0 cosm/8 cos2r/8 cos3z/8 
gaT- 7} 2-1 0 U= 1 | cos3z/8 cos6x/8  cos9x/8 
~t 0 -1 2 -1 -~ ./9 | cos5x/8 cosl0x/8 cos 15x/8 


0 0 -1 1 cos7x/8 cosl4x/8 cos2ix/8 


Those angles 7/8, 37/8, 52/8, 77/8 fit 4 points with spacing 7/4 between 0 and 
x. The sine transform has three points 7/4, 27/4, 37/4. The full cosine transform 
includes u4 from the “zero frequency” or direct current eigenvector (1, 1,1, 1). 


The 8 by 8 cosine transform in 2D is the workhorse of jpeg compression. Linear 
algebra (circulant, Toeplitz, orthogonal matrices) is at the heart of signal processing. 
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Table of Eigenvalues and Eigenvectors 


How are the properties of a matrix reflected in its eigenvalues and eigenvectors? 
This question is fundamental throughout Chapter 6. A table that organizes the key facts may 
be helpful. Here are the special properties of the eigenvalues A; and the eigenvectors x;. 


Symmetric: A? = A 
Orthogonal: OT = Qu! 
Skew-symmetric: AT = —A 
Complex Hermitian: A = A 
Positive Definite: x™Ax > 0 
Markov: Mij > 0, rel Mij = l 
Similar: B = M~!AM 
Projection: P = P? = PT 
Plane Rotation 

Reflection: 7 — 2uu™ 

Rank One: uv! 

Inverse: A7! 

Shift: A + cl 

Stable Powers: A” — 0 

Stable Exponential; e4’ —> 0 
Cyclic Permutation: row 1 of 7 last 


Tridiagonal:; —1,2,—1 on diagonals 
Diagonalizable: A = SAS! 
Symmetric: A = QAQ?™ 

Schur: A = QTQ™! 

Jordan: J = MT!AM 
Rectangular: A = UVT | 


real A’s 

all |A| = 1 
imaginary 4’s 
real A’s 

allà > 0 
Amax =1 
A(B) = A(A) 
A=1;0 
ei? ande 
A=-1; 1,..,1 
A = vfu; 0,..,0 
1/A(A) 

A(A) +¢ 

all |A| < 1 
allReaA <0 

Àk — e2nik/n 


-ié 


kx 


Ak = 2—2cos #4 


diagonal of A 
diagonal of A (real) 
diagonal of T 
diagonal of J 
rank(A) = rank(X) 


orthogonal x! x; = 0 
orthogonal xj x; = 0 
orthogonal xx j=0 
orthogonal x; x; =0 
orthogonal since AT = A 
steady state x > 0 
x(B) = M~'!x(A) 
column space; nullspace 
x = (l,i) and (1, —i) 
L 
4 


u; whole plane u 

u; whole plane v 

keep eigenvectors of A 

keep eigenvectors of A 
any eigenvectors 
any eigenvectors 

Xk = (1, Àk,- , A871) 


— zn kx -n 2k 
Xk = (sin ŽE, sin 24%...) 


columns of S are independent 

columns of Q are orthonormal 

columns of Q if ATA = AA" 
each block gives x = (0,..,1,..,0° 
eigenvectors of A’ A, AAT in V, U 


Chapter 7 


Linear Transformations 


7.1 The idea of a Linear Transformation 


When a matrix A multiplies a vector v, it “transforms” v into another vector Av. 
In goes v, out comes T(v) = Av. A transformation T follows the same idea as a function. 
In goes a number x, out comes f(x). For one vector v or one number x, we multiply 
by the matrix or we evaluate the function. The deeper goal is to see all v’s at once. We are 
transforming the whole space V when we multiply every v by A. 

Start again with a matrix A. It transforms v to Av. It transforms w to Aw. Then we 
know what happens to u = v + w. There is no doubt about Au, it has to equal Av + Aw. 
Matrix multiplication T (v) = Av gives a linear transformation: 


A transformation T assiğns - an output TO) to’ each input vector ù in. V. 
The ‘transformation is linear.. Gf at üi 


pts. these n juirements. for all Ù: and. w: 
Ca Tw +w) = Te) + T(w) ©) Tv) = = TW) foralle. 
If the input is v = 0, the output must be T (v) = 0. We combine (a) and (b) into one: 


Again I can test matrix multiplication for linearity: A(cv + dw) = cAv + dAw is true. 
A linear transformation is highly restricted. Suppose T adds uo to every vector. 

Then T(v) = v + uo and T(w) = w + uo. This isn’t good, or at least it isn’t linear. 

Applying T to v + w produces v + w + ug. That is not the same as T(v) + T(w): 


Shift is not linear v+twtug isnot T(v)+T(w) =v + uo + w + uo. 
The exception is when to = 0. The transformation reduces to T(v) = v. This is the 


identity transformation (nothing moves, as in multiplication by the identity matrix). 
That is certainly linear. In this case the input space V is the same as the output space W. 
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The linear-plus-shift transformation T (v) = Av + uo is called “affine” . Straight lines 
stay straight although T is not linear. Computer graphics works with affine transformations 
in Section 8.6, because we must be able to move images. 


Example 1 Choose a fixed vector a = (1,3, 4), and let T (v) be the dot product a- v: 
The inputis v = (v4, v2, v3). The outputis T(v) =a-v = vy + 3v + 403. 


This is linear. The inputs v come from three-dimensional space, so V = R>. The outputs 
are just numbers, so the output space is W = R!. We are multiplying by the row matrix 
A=[1 3 4]. Then T(v) = Av. 

You will get good at recognizing which transformations are linear. If the output involves 
squares or products or lengths, vj or v1 v2 or ||v]|, then T is not linear. 


Example 2 The length T(v) = ||v|| is not linear. Requirement (a) for linearity would be 
|v + wl] = loll + lwl. Requirement (b) would be ||cv|| = c||v||. Both are false! 

Not (a): The sides of a triangle satisfy an inequality ||v + w|| < ||v|| + wl]. 

Not (b): The length || — v|| is not —||v||. For negative c, we fail. 


Example 3 (Important) T is the transformation that rotates every vector by 30°. The 
“domain” is the xy plane (all input vectors v). The “range” is also the xy plane (all rotated 
vectors T(v)). We described T without a matrix: rotate by 30°. 

Is rotation linear? Yes it is. We can rotate two vectors and add the results. The sum of 
rotations T (v) + T(w) is the same as the rotation T(v + w) of the sum. The whole plane 
is turning together, in this linear transformation. 


Lines to Lines, Triangles to Triangles 


Figure 7.1 shows the line from v to w in the input space. It also shows the line from T (v) 
to T(w) in the output space. Linearity tells us: Every point on the input line goes onto 
the output line. And more than that: Equally spaced points go to equally spaced points. 
The middle point u = ty + tw goes to the middle point T (u) = iT (w) + iT (w). 

The second figure moves up a dimension. Now we have three corners v1, v2, V3. 
Those inputs have three outputs 7(v;), T(v2), T (v3). The input triangle goes onto the 
output triangle. Equally spaced points stay equally spaced (along the edges, and then 
between the edges). The middle point u = iv + v2 + v3) goes to the middle point 
T(u) = 3(T@1) + T(v2) + T(vs)). 

The rule of linearity extends to combinations of three vectors or n vectors: 


u= C1U1 + C2V2 +-+ CnUn transforms: tò: HE 


| | > k Linea ity 
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T(v3) 
T(v) 
A N o 
T (w) 


w T (v2) 


N 


Figure 7.1: Lines to lines, equal spacing to equal spacing, u = 0 to T (u) = 0. 


Note Transformations have a language of their own. Where there is no matrix, we can’t 
talk about a column space. But the idea can be rescued. The column space consisted of all 
outputs Av. The nullspace consisted of all inputs for which Av = 0. Translate those into 
“range” and “kernel”: 

Range of T = set of all outputs T (v): range corresponds to column space 

Kernel of T = set of all inputs for which T (v) = 0: kernel corresponds to nullspace. 


The range is in the output space W. The kernel is in the input space V. When T is 
multiplication by a matrix, T(v) = Av, you can translate to column space and nullspace. 


Examples of Transformations (mostly linear) 


Example 4 Project every 3-dimensional vector straight down onto the xy plane. Then 
T(x, y,Z) = (x, y,0). The range is that plane, which contains every T(v). The kernel is 
the z axis (which projects down to zero). This projection is linear. 


Example 5 Project every 3-dimensional vector onto the horizontal plane z = 1. The 
vector v = (x, y, Z) is transformed to T(v) = (x, y, 1). This transformation is not linear. 
Why not? It doesn’t even transform v = 0 into T(v) = 0. 


Multiply every 3-dimensional vector by a 3 by 3 matrix A. This T7(v) = Av is linear. 
Tw + w) = -Av + w) does equal l Av + Aw = T(v) + T(w). eps 


Example 6 Suppose A is an invertible matrix. The kernel of T is the zero vector; the 
range W equals the domain V. Another linear transformation is multiplication by A7?. 
This is the inverse transformation T—', which brings every vector T (v) back to v: 


T1(T(v)) =v matches the matrix multiplication ATT (Av) =v. 


We are reaching an unavoidable question. Are all linear transformations from V = R” 
to W = R” produced by matrices? When a linear T is described as a “rotation” or 
“projection” or “. . .”, is there always a matrix hiding behind T? 

The answer is yes. This is an approach to linear algebra that doesn’t start with 
matrices. The next section shows that we still end up with matrices. 
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Linear Transformations of the Plane 


It is more interesting to see a transformation than to define it. When a 2 by 2 matrix A 
multiplies all vectors in R*, we can watch how it acts. Start with a “house” that has eleven 
endpoints. Those eleven vectors v are transformed into eleven vectors Av. Straight lines 
between v’s become straight lines between the transformed vectors Av. (The transfor- 
mation from house to house is linear!) Applying A to a standard house produces a new 
house—possibly stretched or rotated or otherwise unlivable. 

This part of the book is visual, not theoretical. We will show four houses and the 
matrices that produce them. The columns of H are the eleven comers of the first house. 
(H is 2 by 12, so plot2d will connect the 11th corner to the first.) The 11 points in the 
house matrix H are multiplied by A to produce the corners AH of the other houses. 


House _|-6 -6 -7 0 7 6 6—3 -3 0 0 -6 
matrix —7 2 1 8 1 2 —7 -7 —2 —2 -7 -7 


1 °] A= eee mee 
| A= 0 1 ae el 
W sa 
0 1 = 
oe 


Figure 7.2: Linear transformations of a house drawn by plot2d(A * H). 


= REVIEW OF THE KEY IDEAS = 


1. A transformation T takes each v in the input space to T (v) in the output space. 


2. T is linear if T(v + w) = T(v) + T(w) and T(cv) = cT(v): lines to lines. 
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3. Combinations to combinations: T (C101 F- Fenn) = C1 Tw) +- +n Tn). 


4. The transformation T(v) = Av + vo is linear only if vọ = 0. Then T(v) = Av. 
=" WORKED EXAMPLES m 


7.1 A The elimination matrix [i a gives a shearing transformation from (x, y) to 
T(x. y) = (x,x + y). Draw the xy plane and show what happens to (1,0) and (1, 1). 
What happens to points on the vertical lines x = 0 and x = a? If the inputs fill the unit 
square 0 < x < 1, 0 < y < 1, draw the outputs (the transformed square). 


Solution The points (1,0) and (2,0) on the x axis transform by T to (1, 1) and (2, 2). 
The horizontal x axis transforms to the 45° line (going through (0, 0) of course). The points 
on the y axis are not moved because T (0, y) = (0, y). The y axis is the line of eigenvectors 
of T with A = 1. Points with x = a move up by a. 


(1, 2) 
Vertical lines slide up 


(1,1) 
This is the shearing (1,1) 
Squares to parallelograms 


(1,0) 


7.1B A nonlinear transformation T is invertible if every b in the output space comes 
from exactly one x in the input space: T(x) = b always has exactly one solution. 
Which of these transformations (on real numbers x) is invertible and what is T~!? 
None are linear, not even T3. When you solve T(x) = b, you are inverting T: 


l 
To=? To(x) =x?) T3(x) =x+9 Ta(x) =e" Ts5(x) = — for nonzero x’s 


x 
Solution T; is not invertible: x? = 1 has two solutions and x? = —1 has no solution. 
T4 is not invertible because e* = —1 has no solution. (If the output space 


changes to positive b’s then the inverse of e* = b is x = lnb.) 


Notice 72 = identity. But T2 (x) = x + 18. What are TŻ (x) and T(x)? 


T>, T3, Ts are invertible. The solutions to x? = b and x + 9 = b and 1 = b are unique: 


x=T;' (b) =b x =Ts'(b)=b-9 x =T5'(b)=1/b 
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Problem Set 7.1 


1 


A linear transformation must leave the zero vector fixed: T(0) = 0. Prove this from 
T(v + w) = T(v) + T(w) by choosing w = (and finish the proof). Prove it 
also from T (cv) = cT(v) by choosing c = 


Requirement (b) gives T(cv) = cT(v) and also T(dw) = dT(w). Then by addi- 
tion, requirement (a) gives T( )=( ). Whatis T(cv + dw + eu)? 


Which of these transformations are not linear? The input is v = (v1, v2): 
(a) Tw) = (@2,vı) Œ) Tw) =(1,) ©) Tw) =r) 
(d) T(w) = (0,1) (e) Tv) = vy -v © Tw) = vv. 

If S and T are linear transformations, is S(7(v)) linear or quadratic? 


(a) (Special case) If S(v) = v and T (v) = v, then S(T(v)) = v or v?? 


(b) (General case) S(w; +w2) = S(w,)+S(w2) and T (vı +02) = T(v1)+T (v2) 
combine into 


S(T (v1 +v) = S(___) = —___ +. 


Suppose T (v) = v except that T(0, v2) = (0,0). Show that this transformation 
satisfies T (cv) = cT (v) but not T(v + w) = T(v) + T(w). 


Which of these transformations satisfy T(v + w) = T(v) + T(w) and which satisfy 
T(cv) =cT(v)? 


(a) Tw) = v/o] ©) Tw) =vit+v24+03) (ce) Tv) = (01, 22, 303) 
(d) T(w) = largest component of v. 


For these transformations of V = R? to W = R?, find T(T(v)). Is this transforma- 
tion T? linear? 


(a) Tw) = —v (b) Tw) =v+(1,)) 
(c) T(v) = 90° rotation = (—v2, v1) 


(d) T(v) = projection = (25%, zituz), 
Find the range and kernel (like the column space and nullspace) of T: 


(a) Tvi, v2) = (v1 — v2, 0) (b) T(v1, v2, v3) = (V1, V2) 
(c) T (vi, v2) = (0,0) (d) T(v1, v2) = (v1, v1). 


The “cyclic” transformation T is defined by T (v1, v2, v3) = (v2, 03,01). What is 
T(T(wv))? What is T? (v)? What is T!°°(v)? Apply T a hundred times to v. 
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10 A linear transformation from V to W has an inverse from W to V when the range is 
all of W and the kernel contains only v = 0. Then T(v) = w has one solution v for 
each w in W. Why are these T’s not invertible? 


(a) T(v1, v2) = (v2, v2) W =R? 
(b) T(v1, v2) = (v1, v2, v1 + v2) W=R 
(c) TQ, v2) = v1 W =R! 


11 «If T(v) = Av and A is m by n, then T is “multiplication by A.” 


(a) What are the input and output spaces V and W? 
(b) Why is range of T = column space of A? 
(c) Why is kernel of T = nullspace of A? 


12 Suppose a linear T transforms (1, 1) to (2, 2) and (2, 0) to (0, 0). Find T (v): 
(a) v = (2,2) (b) v= (3,1) (c) v= (-1,1) (d v= (a,b). 
Problems 13-19 may be harder. The input space V contains all 2 by 2 matrices M. 


13 M is any 2 by 2 matrix and A = [12]. The transformation T is defined by 
7 (M) = AM. What rules of matrix multiplication show that T is linear? 


14 Suppose A = [4 Z]. Show that the range of T is the whole matrix space V and the 
kernel is the zero matrix: 
(1) If AM = 0 prove that M must be the zero matrix. 
(2) Find a solution to AM = B for any 2 by 2 matrix B. 


15 Suppose A = [42]. Show that the identity matrix 7 is not in the range of T. Find a 
nonzero matrix M such that T(M) = AM is zero. 


16 Suppose T transposes every matrix M. Try to find a matrix A which gives AM = 
MT for every M. Show that no matrix A will do it. To professors: Is this a linear 
transformation that doesn’t come from a matrix? 


17 The transformation T that transposes every matrix is definitely linear. Which of these 
extra properties are true? 
(a) T? = identity transformation. 
(b) The kernel of T is the zero matrix. 
(c) Every matrix is in the range of T. 
(d) T(M) = —M is impossible. 


18 Suppose T(M) = [18 ][m][02]. Find a matrix with T(M) # 0. Describe all 
matrices with T(M) = 0 (the kernel) and all output matrices T(M) (the range). 


19 If Aand B are invertible and T(M) = AMB, find TT! (M) inthe form ( )M(_). 
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Questions 20-26 are about house transformations. The output is 7( 7) = AH. 
20 How can you tell from the picture of T (house) that A is 

(a) a diagonal matrix? 

(b) a rank-one matrix? 


(c) a lower triangular matrix? 


21 Draw a picture of T (house) for these matrices: 
2 0 77 1 1l 
D= 1 and a=|% 4 and u=(6 ii 
22 What are the conditions on A = [25] to ensure that T (house) will 


(a) sit straight up? 
(b) expand the house by 3 in all directions? 
(c) rotate the house with no change in its shape? 


23 Describe T (house) when T (v) = —v + (1, 0). This T is “affine”. 
24 Change the house matrix H to add a chimney. 
25 The standard house is drawn by plot2d(H). Circles from o and lines from —: 


x = H(1,:)';y = H(2,)'; 
axis([-1010—1010]), axis(’square’) 
plot(x, y,’0’,x, y,’—’); 


Test plot2d(A’* H) and plot2d(A’« A * H) with the matrices in Figure 7.1. 


26 Without a computer sketch the houses A * H for these matrices A: 


1 0 55 5.5 1] 
E "| and E 5 | and È A and f ol 


27 This code creates a vector theta of 50 angles. It draws the unit circle and then 
T (circle) = ellipse. T(v) = Av takes circles to ellipses. 


A=[21;12] % You can change A 

theta = [0:2 * pi/50:2 * pi]; 

circle = [cos(theta); sin(theta)]; 

ellipse = A x circle; 

axis([—4 4 —4 4]); axis(‘square’) 

plot(circle(1,:), circle(2,:), ellipse(1,:), ellipse(2,:)) 


28 Add two eyes and a smile to the circle in Problem 27. (If one eye is dark and the 
other is light, you can tell when the face is reflected across the y axis.) Multiply by 
matrices A to get new faces. 
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29 


30 


31 


Challenge Problems 


What conditions on det A = ad — bc ensure that the output house AH will 


(a) be squashed onto a line? 
(b) keep its endpoints in clockwise order (not reflected)? 
(c) have the same area as the original house? 
From A = UV? (Singular Value Decomposition) A takes circles to ellipses. 


AV = UX says that the radius vectors vı and v2 of the circle go to the semi-axes 
Oiu and ozu of the ellipse. Draw the circle and the ellipse for 0 = 30°: 


0 1 cos@ —sin@ 2 0 
v=| 4 0 | u=| oN ne | s= | 


Why does every linear transformation T from R? to R? take squares to parallelo- 
grams? Rectangles also go to parallelograms (squashed if T is not invertible). 
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7.2 The Matrix of a Linear Transformation 


The next pages assign a matrix to every linear transformation T. For ordinary column 
vectors, the input v is in V = R” and the output T(z) is in W = R”. The matrix A for 
this transformation T will be m by n. Our choice of bases in V and W will decide A. 

The standard basis vectors for R” and R” are the columns of I. That choice leads to 
a standard matrix, and T(v) = Av in the normal way. But these spaces also have other 
bases, so the same T is represented by other matrices. A main theme of linear algebra is to 
choose the bases that give the best matrix for T. 

When V and W are not R” and R”, they still have bases. Each choice of basis leads 
to a matrix for T. When the input basis is different from the output basis, the matrix for 
T (wv) = v will not be the identity 7. It will be the “change of basis matrix”. 


Key idea of this section 


Suppose we know T(v),..., T (vn) for the basis vectors V,,....Un. 
Then linearity produces T (v) for every other input vector v. 


Reason Every v is a unique combination cjv, +--+ + CnUn of the basis vectors vj. 
Since T is a linear transformation (here is the moment for linearity), T(v) must be 
the same combination cı T (v1) +-+- + cnT (vn) of the known outputs T(v;). 


Our first example gives the outputs T(v) for the standard basis vectors (1,0) and (0, 1). 


Example1 Suppose T transforms vı = (1,0) to T(v1) = (2, 3, 4). Suppose the second 
basis vector v2 = (0, 1) goes to T(v2) = (5,5,5). If T is linear from R? to R? then its 
“standard matrix” is 3 by 2. Those outputs T (v1) and T (v2) go into its columns: 


combines the columns 1 


2 5 2 5 7 
A=|13 5]. T(v1 + V2) — T(w) + T (v2) 3 5 H — | 8 
4 5 4 5 9 


Example2 The derivatives of the functions 1, x, x”, x? are 0, 1, 2x, 3x?. Those are four 


facts about the transformation T that “takes the derivative”. The inputs and the outputs are 
functions! Now add the crucial fact that the “derivative transformation” T is linear: 


due 8 


Te) = 


It is exactly this linearity that you use to find all other derivatives. From the derivative 
of each separate power 1, x, x”, x? (those are the basis vectors v1, v2, U3, V4) you find the 
derivative of any polynomial like 4 + x + x? + x3: 


d 
nt +x+x%+4x3)=142x+3x? (because of linearity!) 


7.2. The Matrix of a Linear Transformation 385 


This example applies T (the derivative d /dx) to the input v = 4v; + v2 + v3 + v4. Here 
the input space V contains all combinations of 1, x, x7, x3. I call them vectors, you might 
call them functions. Those four vectors are a basis for the space V of cubic polynomials 
(degree < 3). Four derivatives tell us all derivatives in V. 


For the nullspace of A, we solve Av = 0. For the kernel of the derivative T, we solve 
du/dx = 0. The solution is v = constant. The nullspace of T is one-dimensional, 
containing all constant functions (like the first basis function v; = 1). 

To find the range (or column space), look at all outputs from T(v) = dvu/dx. The 
inputs are cubic polynomials a +bx +cx? +dx?, so the outputs are quadratic polynomials 
(degree < 2). For the output space W we have a choice. If W = cubics, then the range of 
T (the quadratics) is a subspace. If W = quadratics, then the range is all of W. 

That second choice emphasizes the difference between the domain or input space (V = 
cubics) and the image or output space (W = quadratics). V has dimension n = 4 and W 
has dimension m = 3. The “derivative matrix” below will be 3 by 4. 

The range of T is a three-dimensional subspace. The matrix will have rank r = 3. 
The kernel is one-dimensional. The sum 3 + 1 = 4 is the dimension of the input space. 
This was r + (n — r) = n in the Fundamental Theorem of Linear Algebra. Always 
(dimension of range) + (dimension of kernel) = dimension of input space. 


Example 3 The integral is the inverse of the derivative. That is the Fundamental Theo- 
rem of Calculus. We see it now in linear algebra. The transformation 7~! that “takes the 
integral from 0 to x” is linear! Apply TT! to 1, x, x?, which are w1, w2, w3: 


x x x 
Integration is T~! i Idx =x, Í xdx = $x’, J x? dx = 4x3. 
0 0 0 


By linearity, the integral of w = B + Cx + Dx? is T7! (w) = Bx + ¿Cx? + 4Dx?. 
The integral of a quadratic is a cubic. The input space of TT! is the quadratics, the output 
space is the cubics. Integration takes W back to V. Its matrix will be 4 by 3. 


Range of T7! The outputs Bx + ¿Cx? + 4 Dx? are cubics with no constant term. 
Kernel of T7! The output is zero only if B = C = D = 0. The nullspace is Z = {0}. 


Fundamental Theorem 3 + 0 is the dimension of the input space W for T?. 


Matrices for the Derivative and Integral 


We will show how the matrices A and AT} copy the derivative T and the integral T~!. 
This is an excellent example from calculus. (I write AT! but I don’t quite mean it.) 
Then comes the general rule—how to represent any linear transformation T by a matrix A. 
The derivative transforms the space V of cubics to the space W of quadratics. The 
basis for V is 1, x, x”, x°. The basis for W is 1, x, x?. The derivative matrix is 3 by 4: 


(2) 
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Why is A the correct matrix? Because multiplying by A agrees with transforming by T. 
The derivative of v = a +bx +cx?+dx? is T(v) = b+ 2cx +3dx?. The same numbers 
b and 2c and 3d appear when we multiply by the matrix A: 


0100 b 
Take the derivative 0 0 2 0 - |= 2c |. (3) 
00 0 3 d 3d 


Look also at T~!. The integration matrix is 4 by 3. Watch how the following matrix starts 
with w = B + Cx + Dx? and produces its integral 0 + Bx + 5Cx? + Dx’: 


ma 00 pog 
‘Take the integral |. or lel: (4) 


I want to call that matrix AT}, and I will. But you realize that rectangular matrices don’t 
have inverses. At least they don’t have two-sided inverses. This rectangular A has a one- 
sided inverse. The integral is a one-sided inverse of the derivative! 


AA '=!10 1 0 but A ‘A= 
001 00 10 
0001 


If you integrate a function and then differentiate, you get back to the start. So AAT! = J. 
But if you differentiate before integrating, the constant term is lost. The integral of the 
derivative of 1 is zero: 


T~!T(1) = integral of zero function = 0. 
This matches A~!A, whose first column is all zero. The derivative T has a kernel (the 
constant functions). Its matrix A has a nullspace. Main point again: Av copies T(v). 
Construction of the Matrix 


Now we construct a matrix for any linear transformation. Suppose T transforms the space 
V (n-dimensional) to the space W (m-dimensional). We choose a basis 11,...,U, for V 
and we choose a basis w1,..., Wm for W. The matrix A will be m by n. To find the first 
column of A, apply T to the first basis vector vı. The output T (v1) is in W. 


T (v4) is a combination . aw tee + ami Wm of the output basis for W. 


These numbers a11,.. . ami go into the first column of A. Transforming v to T(v;) 
matches multiplying (1,0,...,0) by A. It yields that first column of the matrix. 
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When T is the derivative and the first basis vector is 1, its derivative is T(v1) = 0. 
So for the derivative matrix, the first column of A was all zero. 

For the integral, the first basis function is again 1. Its integral is the second basis 
function x. So the first column of A~! was (0, 1,0, 0). Here is the construction of A. 


` Key rule: The jth column of A is found by applying T to the jth basis vector v; ` ae : 
i T(w;)= combination of basis vectors of W = ayjwy +e + amjWm. k 6) 


These numbers a4;,...,@mj go into column j of A. The matrix is constructed to get the 
basis vectors right. Then linearity gets all other vectors right. Every v is a combination 
C1 U1 e+ + Cy Vp, and T (v) is a combination of the w’s. When A multiplies the coefficient 
vector € = (Cj,...,Cn) in the v combination, Ac produces the coefficients in the T (v) 
combination. This is because matrix multiplication (combining columns) is linear like T. 

The matrix A tells us what T does. Every linear transformation from V to W can be 
converted to a matrix. This matrix depends on the bases. 


Example 4 If the bases change, T is the same but the matrix A is different. 

Suppose we reorder the basis to x, x”, x3, 1 for the cubics in V. Keep the original basis 
1, x, x? for the quadratics in W. The derivative of the first basis vector vı = x is the first 
basis vector w; = 1. So the first column of A looks different: 


100 0 matrix for the derivative T 
Anew = | 0 2 O OJ| = when the bases change to 
003 0 x,x?,x>, 1 and 1,x, x”. 


When we reorder the basis of V, we reorder the columns of A. The input basis vector v j 
is responsible for column j. The output basis vector w; is responsible for row i. Soon the 
changes in the bases will be more than permutations. 


Products 4B Match Transformations T.S 


The examples of derivative and integral made three points. First, linear transformations T 
are everywhere—in calculus and differential equations and linear algebra. Second, spaces 
other than R” are important—we had functions in V and W. Third, T still boils down to a 
matrix A. Now we make sure that we can find this matrix. 

The next examples have V = W. We choose the same basis for both spaces. Then we 
can compare the matrices A? and AB with the transformations T? and TS. 


Example 5 T rotates every vector by the angle 8. Here V = W = R?. Find A. 


Solution The standard basis is vı = (1,0) and v2 = (0,1). To find A, apply T to those 
basis vectors. In Figure 7.3a, they are rotated by 0. The first vector (1,0) swings around 
to (cos @,sin@). This equals cos @ times (1,0) plus sin @ times (0,1). Therefore those 
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numbers cos 0 and sin 0 go into the first column of A: 


cos —sin#@ 


sin cos@ 


cos @ 
sin ô 


| shows column 1 A= | | shows both columns. 


For the second column, transform the second vector (0, 1). The figure shows it rotated to 
(—sin@,cos@). Those numbers go into the second column. Multiplying A times (0, 1) 
produces that column. A agrees with T on the basis, and on all v. 


Figure 7.3: Two transformations: Rotation by @ and projection onto the 45° line. 


Example 6 (Projection) Suppose T projects every plane vector onto the 45° line. 
Find its matrix for two different choices of the basis. We will find two matrices. 


Solution Start with a specially chosen basis, not drawn in Figure 7.3. The basis vector 
vı is along the 45° line. It projects to itself: T (vı) = vı. So the first column of A 
contains 1 and 0. The second basis vector v2 is along the perpendicular line (135°). This 
basis vector projects to zero. So the second column of A contains 0 and 0: 


Projection A= | 0 0 


I o| when V and W have the 45° and 135° basis. 


Now take the standard basis (1,0) and (0, 1). Figure 7.3b shows how (1,0) projects 
to G, 3). That gives the first column of A. The other basis vector (0, 1) also projects to 
(4, 5): So the standard matrix for this projection is A: 


Same projection A= | l for the standard basis. 


Nje Nj 
Nile Nie 


Both A’s are projection matrices. If you square A it doesn’t change. Projecting twice is 
the same as projecting once: T? = T so A? = A. Notice what is hidden in that statement: 
The matrix for T? is A?. 
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We have come to something important—the real reason for the way matrices are multiplied. 
At last we discover why! Two transformations S and T are represented by two matrices B 
and A. When we apply T to the output from S, we get the “composition” TS. When we 
apply A after B, we get the matrix product AB. Matrix multiplication gives the correct 
matrix AB to represent TS. 

The transformation S is from a space U to V. Its matrix B uses a basis u,,..., 4p for 
U and a basis v1,..., Un for V. The matrix is n by p. The transformation T is from V 
to W as before. [ts matrix A must use the same basis v1,...,Un for V—this is the output 
space for S and the input space for T. Then the matrix AB matches TS: 


‘Multiplication The linear transformation TS starts’ with any vector w in U, goes 
to S(u) in Vand then to T(S(u)) in W. The matrix AB starts with any x in R?, 
goes to Bx in R” and then to ABx in R™. The matrix-AB correctly represents TS: 


TS: U-v-Ww AB: (mbyn)(n by p) = (m by p). 


The input is u = xit; ++ + Xpüp. The output T(S(w)) matches the output ABx. 
Product of transformations matches product of matrices. 

The most important cases are when the spaces U, V, W are the same and their bases are 
the same. With m = n = p we have square matrices. 


Example 7 ŞS rotates the plane by @ and T also rotates by 0. Then TS rotates by 20. 
This transformation T? corresponds to the rotation matrix A? through 26: 

_ _ 2 . 2 _ |cos20 —sin26 

T=5 A=B T7 = rotation by 20 Av = he cos 26 |" (6) 


By matching (transformation)? with (matrix)?, we pick up the formulas for cos 20 
and sin 20. Multiply A times A: 
cos? —sin@][cos@ —sin@]  [cos?@—sin*?@ —2sin 8 cos 8 (7) 
sin cos@|| sinf cos@|~ | 2sin@cos@ cos? @—sin?@ |" 


Comparing (6) with (7) produces cos20 = cos? 6 — sin? @ and sin26 = 2sin0cos@. 
Trigonometry (the double angle rule) comes from linear algebra. 


Example 8 § rotates by 0 and T rotates by —@. Then TS = I matches AB = J. 

In this case T(S(w)) is u. We rotate forward and back. For the matrices to match, ABx 
must be x. The two matrices are inverses. Check this by putting cos(—@) = cos @ and 
sin(—@) = — sin @ into the backward rotation matrix: 


: _ ef 2 - 2 
AB = cos @ sin ||cos@ —sin@| _ | cos*@ + sin“ 8 0 =] 
—sin@ cos 


sinô cosd| ` 0 cos? 6 + sin? 0 
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Earlier T took the derivative and S took the integral. The transformation TS is the 
identity but not ST. Therefore AB is the identity matrix but not BA: 


AB=|0 0 2 0 i =I bt BA= 
0003112 2 0 001 0 
00 ił 000 1 


The Identity Transformation and the Change of Basis Matrix 


We now find the matrix for the special and boring transformation T(v) = v. This 
identity transformation does nothing to v. The matrix for T = J also does nothing, 
provided the output basis is the same as the input basis. The output T (v1) is v1. When the 
bases are the same, this is w1. So the first column of A is (1,0,...,0). 


When each output T (vz) = v; is the same as. w ;, the matrix is just I. 


This seems reasonable: The identity transformation is represented by the identity matrix. 
But suppose the bases are different. Then T(vı) = vı is a combination of the w’s. 
That combination miw +°- + ™my1W,, tells the first column of the matrix (call it M). 


> Identity” E E © When the outputs T (vj) = vj are combinations 
‘transformation Z; =1 Mij wi, the “change of basis matrix” is M. 


The basis is changing but the vectors themselves are not changing: T (v) = v. When the 
inputs have one basis and the outputs have another basis, the matrix is not J. 


Example 9 The input basis is vı = (3,7) and v2 = (2,5). The output basis is wy = 
(1, 0) and w2 = (0,1). Then the matrix M is easy to compute: 


Change of basis The matrix for T(v)=v is M = E 5] . 


Reason The first input is the basis vector vı = (3,7). The output is also (3, 7) which we 
express as 3w , + 7w2. Then the first column of M contains 3 and 7. 

This seems too simple to be important. It becomes trickier when the change of basis 
goes the other way. We get the inverse of the previous matrix M: 


Example 1Q The input basis is now vı = (1,0) and v2 = (0,1). The outputs are just 
T (v) = v. But the output basis is now w1 = (3,7) and wz = (2,5). 


—1 
Reverse the bases . , 3 2 5 —2 
Invert the matrix The matrix for T(v) =v is E | 7 È 3| ` 


Reason The first input is vı = (1,0). The output is also vı but we express it as 5w1 — 
7w2. Check that 5(3, 7) — 7(2, 5) does produce (1,0). We are combining the columns of 
the previous M to get the columns of I. The matrix to do that is M—!. 
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Change basis 5 2] _ l a 
Change back [wi wa] E | =[v, v2|isMM™ =]. 


A mathematician would say that the matrix MMT! corresponds to the product of two 
identity transformations. We start and end with the same basis (1,0) and (0, 1). Matrix 
multiplication must give 7. So the two change of basis matrices are inverses. 


One thing is sure. Multiplying A times (1,0,...,0) gives column 1 of the matrix. The 
novelty of this section is that (1,0,...,0) stands for the first vector v;, written in the ba- 
sis of v’s. Then column 1 of the matrix is that same vector v1, written in the standard basis. 


Wavelet Transform = Change to Wavelet Basis 


Wavelets are little waves. They have different lengths and they are localized at different 
places. The first basis vector is not actually a wavelet, it is the very useful flat vector of all 
ones. This example shows “Haar wavelets”: 


l l l 0 
Haar basis w = w = | w3 = p w4 = ° (8) 
l —1 0 —1 


Those vectors are orthogonal, which is good. You see how wz is localized in the first 
half and w4 is localized in the second half. The wavelet transform finds the coefficients 
C1, C2, C3, C4 when the input signal v = (v1, v2, v3, v4) is expressed in the wavelet basis: 


The coefficients c3 and c4 tell us about details in the first half and last half of v. The 
coefficient cı is the average. 

Why do want to change the basis? I think of v1, v2, v3, v4 as the intensities of a signal. 
In audio they are volumes of sound. In images they are pixel values on a scale of black 
to white. An electrocardiogram is a medical signal. Of course n = 4 is very short, and 
n = 10,000 is more realistic. We may need to compress that long signal, by keeping only 
the largest 5% of the coefficients. This is 20 : 1 compression and (to give only two of its 
applications) it makes High Definition TV and video conferencing possible. 

If we keep only 5% of the standard basis coefficients, we lose 95% of the signal. 
In image processing, 95% of the image disappears. In audio, 95% of the tape goes blank. 
But if we choose a better basis of w’s, 5% of the basis vectors can combine to come very 
close to the original signal. In image processing and audio coding, you can’t see or hear 
the difference. We don’t need the other 95%! 

One good basis vector is the flat (1,1,1,1). That part alone can represent the con- 
stant background of our image. A short wave like (0,0, 1,—1) or in higher dimensions 
(0,0,0,0,0,0, 1, —1) represents a detail at the end of the signal. 
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The three steps are the transform and compression and inverse transform. 


coefficients ¢ `" 3 € 


In linear algebra, where everything is perfect, we omit the compression step. The output 
Ð is exactly the same as the input v. The transform gives e = W~ tv and the reconstruction 
brings back v = Wc. In true signal processing, where nothing is perfect but everything is 
fast, the transform (lossless) and the compression (which only loses unnecessary informa- 
tion) are absolutely the keys to success. The output is © = Wc. 

I will show those steps for a typical vector like v = (6, 4, 5, 1). Its wavelet coefficients 
are c = (4,1, 1,2). The reconstruction 4w; + w2 + w3+2w4isv = We: 


6 i 1 l 0 1 1 1 olr4 
4 l ] “| 0 1 1-1 oll4 
554i] tla +] olt? uilli a 0 fsa (10) 
i 1 —] 0 —] 1-1 0 -1ıll2 


Those coefficients c are W~!v. Inverting this basis matrix W is easy because the w’s in its 
columns are orthogonal. But they are not unit vectors, so rescale: 


mn 1 1 1 1 

1 
1 1 1-1 -l 

-1 _ 4 
wW 1 1-1 0 0 
ij|o o0 1 -i 


The 4's in the first row of e = W~!v mean that c, = 4 is the average of 6, 4, 5, 1. 


Example 11 (Same wavelet basis by recursion) I can’t resist showing you a faster 
way to find the c’s. The special point of the wavelet basis is that you can pick off the 
details in cz and c4, before the coarse details in cz and the overall average in cy. A picture 
will explain this “multiscale” method, which is in Chapter 1 of my textbook with Nguyen 


on Wavelets and Filter Banks (Wellesley-Cambridge Press). 
Split v = (6, 4,5, 1) into averages and waves at small scale and then large scale: 


/ 


averages differences/2 


Il 
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C3 
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N =l =2 
average difference /2 
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Fourier Transform (DFT) = Change to Fourier Basis 


The first thing an electrical engineer does with a signal is to take its Fourier transform. 
For finite vectors we are speaking about the Discrete Fourier Transform. The DFT 
involves complex numbers (powers of e?'/"). But if we choose n = 4, the matrices 
are small and the only complex numbers are i and i? = —i. A true electrical engineer 
would write j instead of i for /—1. 


Fourier basis w; to w, Fe “I ace 
in the columns of F ~ | a? 
EL i8 


The first column is the useful fiat basis vector (1, 1,1, 1). It represents the average signal 
or the direct current (the DC term). It is a wave at zero frequency. The third column is 
(1,—1,1,—1), which alternates at the highest frequency. The Fourier transform decom- 
poses the signal into waves at equally spaced frequencies. 

The Fourier matrix F is absolutely the most important complex matrix in mathematics 
and science and engineering. Section 10.3 of this book explains the Fast Fourier 
Transform: it can be seen as a factorization of F into matrices with many zeros. 
The FFT has revolutionized entire industries, by speeding up the Fourier transform. 
The beautiful thing is that F—! looks like F, with i changed to —i: 


. 1 l l 1 
Fourier transform v to c tli (<i) Cay? (a) 1 
V=CyWy tes +CnWn = Fe Fo =F (i it (-i) = af. 
Fourier coefficients c = F7!v , , , 

1 (~i)? (-i)® i)? 
The MATLAB command c = fft(v) produces the Fourier coefficients cj,. . .,c, of the 
vector v. It multiplies v by F—! (fast). 


E REVIEW OF THE KEY IDEAS =u 


1. If we know T(v1),..., T (va) for a basis, linearity will determine all other T (v). 


Linear transformation T Matrix A (m by n) 
2. 4 Input basisv,,...,Uz — represents T 
Output basis w1,. . ., Wm in these bases 


3. The derivative and integral matrices are one-sided inverses: d(constant)/dx = 0: 


(Derivative) (Integral) = 7 is the Fundamental Theorem of Calculus. 


4. If A and B represent T and S, and the output basis for S is the input basis for 7, 
then the matrix AB represents the transformation 7(S(#)). 
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5. The change of basis matrix M represents T(v) = v. Its columns are the coefficients 
of the output basis expressed in the input basis: w; = m4j;U1 ++ + MyjUn. 


= WORKED EXAMPLES m 


7.2 A Using the standard basis, find the 4 by 4 matrix P that represents a cyclic 
permutation T from x = (x1, X2, X3, X4) to T(x) = (x4, x1, x2, X3). Find the matrix 
for T?. What is the triple shift T? (x) and why is T? = T~!? 

Find two real independent eigenvectors of P. What are all the eigenvalues of P? 


Solution The first vector (1, 0, 0, 0) in the standard basis transforms to (0, 1,0, 0) which 
is the second basis vector. So the first column of P is (0, 1,0, 0). The other three columns 
come from transforming the other three standard basis vectors: 


000 1 Xi X4 
(1000 x2 | | x1 , 
P = 0100 Then P xy | | xe copies T. 
0 0 1 0 X4 X3 


Since we used the standard basis, T is ordinary multiplication by P . The matrix for T? is 
a “double cyclic shift” P? and it produces (x3, x4, x1, X2). 

The triple shift T? will transform x = (x1, X2, X3, X4) to T?(x) = (x2, X3, X4, X1). 
If we apply T once more we are back to the original x. So T4 = identity transformation 
and P4 = identity matrix. 


Two real eigenvectors of P are (1,1, 1,1) with eigenvalue A = 1 and (1,—1,1,-1) 
with eigenvalue A = —1. The shift leaves (1, 1,1, 1) unchanged and it reverses signs in 
(1,1, i, —1). The other eigenvalues are i and —7. The determinant is AjA2A3A4 = —1. 

Notice that the eigenvalues 1,i,—1,—i add to zero (the trace of P). They are the 
Ath roots of 1, since det(P — AJ) = A* — 1. They are at angles 0°, 90°, 180°, 270° 
in the complex plane. The Fourier matrix F is the eigenvector matrix for P. 


7.2B The space of 2 by 2 matrices has these four “vectors” as a basis: 


v = 1 0 v = 0 1l va = 0 0 _}| 9 0 
=|) o 0 250 0 35] 0 "ato 1 | 
T is the linear transformation that transposes every 2 by 2 matrix. What is the matrix A 


that represents 7 in this basis (output basis = input basis)? What is the inverse matrix 
A1? What is the transformation T~! that inverts the transpose operation? 
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Solution Transposing those four “basis matrices” just reverses v2 and v3: 


T(v1) =v; 1 00 0 
T(v2) = v3 , 10010 
T(v3) = v2 gives the four columns of A= 0100 
T (wa) = V4 0 00 1 


The inverse matrix A! is the same as A. The inverse transformation T~! is the same as 
T. If we transpose and transpose again, the final output equals the original input. 


Problem Set 7.2 


Questions 1—4 extend the first derivative example to higher derivatives. 


1 The transformation S takes the second derivative. Keep 1,x,x*, x? as the basis 
V1, U2, V3, V4 and also aS w1, W2, W3, W4. Write Svi, Sv2, Sv3, 5v4 in terms of 
the w’s. Find the 4 by 4 matrix B for S. 


2 What functions have v” = 0? They are in the kernel of the second derivative S. 
What vectors are in the nullspace of its matrix B in Problem 1? 


3 B is not the square of a rectangular first derivative matrix: 
0 10 0 
A=|0 0 2 0 | doesnotallow A?. 
0 0 0 3 
Add a zero row to A, so that output space = input space. Compare A” with B. 
Conclusion: For B = A? we want output basis = basis. Then m = n. 
4 (a) The product TS of first and second derivatives produces the third derivative. 


Add zeros to make 4 by 4 matrices, then compute AB. 
(b) The matrix B? corresponds to S* = fourth derivative. Why is this zero? 


Questions 5—9 are about a particular T and its matrix A. 


5 With bases v1, V2, V3 and w1, w2, w3, suppose T (v1) = wz and T (v2) = T(v3) = 
w; + w3. T is a linear transformation. Find the matrix A and multiply by the 
vector (1, 1, 1). What is the output from 7 when the input is vi + v2 + v3? 


6 Since T(v2) = T (v3), the solutions to T (v) = 0 are v = 
in the nullspace of A? Find all solutions to T (v) = wo. 


. What vectors are 


7 Find a vector that is not in the column space of A. Find a combination of w’s that is 
not in the range of T. 


8 You don’t have enough information to determine T?. Why is its matrix not necessar- 
ily A”? What more information do you need? 
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9 Find the rank of A. This is not the dimension of the output space W. It is the 
dimension of the of T. 


Questions 10-13 are about invertible linear transformations. 


10 Suppose T(v |) = wı + w2 + w3 and T(v2) = wz + w3 and T(v3) = w3. Find 
the matrix A for T using these basis vectors. What input vector v gives T(v) = w,? 


11 = Invert the matrix A in Problem 10. Also invert the transformation 7—what are 
T—!(w,) and T~!(w2) and T~!(w3)? 


12 Which of these are true and why is the other one ridiculous? 
@) TIT=1 (&) T(T@))=01 ©) TO \(T(wi)) = w. 
13 Suppose the spaces V and W have the same basis v1, v2. 


(a) Describe a transformation T (not /) that is its own inverse. 
(b) Describe a transformation T (not 7) that equals T?. 
(c) Why can’t the same T be used for both (a) and (b)? 


Questions 14-19 are about changing the basis. 


14 (a) What matrix transforms (1, 0) into (2,5) and transforms (0, 1) to (1,3)? 
(b) What matrix transforms (2,5) to (1, 0) and (1, 3) to (0, 1)? 
(©) Why does no matrix transform (2, 6) to (1, 0) and (1, 3) to (0, 1)? 


15 (a) What matrix M transforms (1, 0) and (0, 1) to (r, t£) and (s, u)? 
(b) What matrix N transforms (a,c) and (b, d) to (1,0) and (0, 1)? 
(c) What condition on a, b, c,d will make part (b) impossible? 
16 (a) How do M and N in Problem 15 yield the matrix that transforms (a,c) to (r,t) 
and (b, d) to (s, u)? 
(b) What matrix transforms (2, 5) to (1, 1) and (1, 3) to (0, 2)? 


17 If you keep the same basis vectors but put them in a different order, the change of 
basis matrix M isa matrix. If you keep the basis vectors in order but change 


their lengths, M is a matrix. 


18 The matrix that rotates the axis vectors (1,0) and (0,1) through an angle @ is Q. 
What are the coordinates (a, b) of the original (1,0) using the new (rotated) axes? 
This inverse can be tricky. Draw a figure or solve for a and b: 


cos@ —sin@ 1 cos 6 — sin 
o= o one | ol =el olte] ona |: 
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19 The matrix that transforms (1,0) and (0,1) to (1,4) and (1,5) is M = 


The combination a(1,4) + (1,5) that equals (1,0) has (a,b) = ( , ). 
How are those new coordinates of (1,0) related to M or M71? 
Questions 20-23 are about the space of quadratic polynomials A + Bx + Cx?. 
20 = The parabola w; = t(x? + x) equals one at x = 1, and zero at x = 0 and x = —1. 
Find the parabolas w2, w3, and then find y(x) by linearity. 
(a) wz equals one at x = 0 and zero at x = 1 and x = —I. 
(b) w3 equals one at x = —1 and zero at x = Ô and x = 1. 
(c) y(x) equals 4 at x = 1 and 5 at x = 0 and 6 at x = —1. Use w1, W2, W3. 


21 One basis for second-degree polynomials is vı = 1 and vg = x and v3 = x?. 
Another basis is w1, W2, w3 from Problem 20. Find two change of basis matrices, 
from the w’s to the v’s and from the v’s to the w’s. 


22 What are the three equations for A, B,C if the parabola Y = A+ Bx + Cx? equals 
4at x = a and 5at x = b and 6 at x = c? Find the determinant of the 3 by 3 matrix. 
That matrix transforms values like 4,5, 6 to parabolas—or is it the other way? 


23 Under what condition on the numbers m1, m2, ...,mo do these three parabolas give 
a basis for the space of all parabolas? 


vı = mı + mex +m3x2, v2 = M4 + msx + mM6X?, U3 = m7 + MgX + Mox’. 


24 The Gram-Schmidt process changes a basis a;,a@2,@3 to an orthonormal basis 
41:42:43. These are columns in A = QR. Show that R is the change of basis 
matrix from the a’s to the q’°s (a2 is what combination of g’s when A = QR?). 


25 Elimination changes the rows of A to the rows of U with A = LU. Row 2 of A is 
what combination of the rows of U? Writing A? = U'L? to work with columns, 
the change of basis matrix is M = LT. (We have bases provided the matrices are 


-) 


26 Suppose v1, v2, v3 are eigenvectors for T. This means T(v;) = A;v; fori = 
1,2,3. What is the matrix for T when the input and output bases are the v’s? 


27 Every invertible linear transformation can have J as its matrix! Choose any input 
basis v1,..., Un. For output basis choose w; = T (v;i). Why must T be invertible? 


28 Using v; = w, and v2 = w2 find the standard matrix for these 7'’s: 
(a) T(v,) = 0 and T(v2) = 3v, (b) T(vi) = v and Twi + v2) = v1. 


29 Suppose 7 is reflection across the x axis and S is reflection across the y axis. The 
domain V is the xy plane. Ifv = (x, y) what is S(T (v))? Find a simpler description 
of the product ST. 
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Suppose T is reflection across the 45° line, and S is reflection across the y axis. 
Ifv = (2,1) then T(v) = (1,2). Find S(7(wv)) and T(S(v)). This shows that 
generally ST # TS. 


Show that the product ST of two reflections is a rotation. Multiply these reflection 
matrices to find the rotation angle: 


cos 20 sin 20 cos 2a sin 2a 
sin2@ —cos20 sin2a@ —cos2a |’ 

True or false: If we know T (v) for n different nonzero vectors in R”, then we know 

T(v) for every vector in R”. 


Express e = (1,0,0,0) and v = (1,—1,1,—1) in the wavelet basis, as in equa- 
tions (8-10). The coefficients c1, €2, €3, c4 solve We = e and We = v. 


To represent v = (7, 5,3, 1) in the wavelet basis, start with (6, 6,2,2)+(1,—1, 1, —1). 
Then write 6,6,2,2 as an overall average plus a difference, using 1,1,1,1 and 
1,1,—1,—1. 


What are the eight vectors in the wavelet basis for R? They include the long wavelet 
(1,1,1,1,—1,—1,—1,—1) and the short wavelet (1, —1, 0, 0, 0, 0, 0, 0). 


Suppose we have two bases v1,...,Un and w1,...,Wy for R”. If a vector has 
coefficients b; in one basis and c; in the other basis, what is the change of basis 
matrix in b = Mc? Start from 


bivi +--+ bnUn = Vb = ciwy +--+) + enwp = We. 


Your answer represents T(v) = v with input basis of v’s and output basis of w’s. 
Because of different bases, the matrix is not J. 


Challenge Problems 


The space M of 2 by 2 matrices has the basis vy,v2,v3,v¥4 in Worked 
Example 7.2 B. Suppose T multiplies each matrix by [2]. What 4 by 4 matrix 
A represents this transformation T on matrix space? 


Suppose A is a 3 by 4 matrix of rank r = 2, and T(v) = Av. Choose input basis 
vectors v1, U2 from the row space of A and v3, v4 from the nullspace. Choose output 
basis w; = Avı, W2 = Av in the column space and w3 from the nullspace of AT. 
What specially simple matrix represents this T in these special bases? 
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7.3 Diagonalization and the Pseudoinverse 


This section produces better matrices by choosing better bases. When the goal is a diagonal 
matrix, one way is a basis of eigenvectors. The other way is two bases (the input and output 
bases are different). Those left and right singular vectors are orthonormal basis vectors for 
the four fundamental subspaces of A. They come from the SVD. 

By reversing those input and output bases, we will find the “pseudoinverse” of A. 
This matrix At sends R” back to R”, and it sends column space back to row space. 


The truth is that all our great factorizations of A can be regarded as a change of basis. 
But this is a short section, so we concentrate on the two outstanding examples. In both 
cases the good matrix is diagonal. It is A with one basis or È with two bases. 


1. STIAS = A when the input and output bases are eigenvectors of A. 
2. U-!AV = E when those bases are eigenvectors of ATA and AA". 


You see immediately the difference between A and £. In A the bases are the same. 
Then m = n and the matrix A must be square. And some square matrices cannot be 
diagonalized by any S, because they don’t have n independent eigenvectors. 


In & the input and output bases are different. The matrix A can be rectangular. 
The bases are orthonormal because ATA and AAT are symmetric. Then UT! = UT 
and V7! = VT, Every matrix A is allowed, and A has the diagonal form =. 
This is the Singular Value Decomposition (SVD) of Section 6.7. 


The eigenvector basis is orthonormal only when ATA = AAT (a “normal” matrix). 
That includes symmetric and antisymmetric and orthogonal matrices (special might be a 
better word than normal). In this case the singular values in È are the absolute values 
o; = |A;|, so that © = abs(A). The two diagonalizations are the same when A? A = AAT, 
except for possible factors —1 (real) and ei? (complex). 


I will just note that the Gram-Schmidt factorization A = QR chooses only one new 
basis. That is the orthogonal output basis given by Q. The input uses the standard basis 
given by 7. We don’t reach a diagonal È, but we do reach a triangular R. The output basis 
matrix appears on the left and the input basis appears on the right, in A = ORI. 

We start with input basis equal to output basis. That will produce S and S~!. 


Similar Matrices: A and S~!AS and W-'AW 


Begin with a square matrix and one basis. The input space V is R” and the output space W 
is also R”. The standard basis vectors are the columns of J. The matrix is n by n, and we 
call it A. The linear transformation 7 is “multiplication by A”. 

Most of this book has been about one fundamental problem—to make the matrix simple. 
We made it triangular in Chapter 2 (by elimination) and Chapter 4 (by Gram-Schmidt). 
We made it diagonal in Chapter 6 (by eigenvectors). Now that change from A to A 
comes from a change of basis: Eigenvalue matrix from eigenvector basis. 
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Here are the main facts in advance. When you change the basis for V, the matrix 
changes from A to AM. Because V is the input space, the matrix M goes on the right (to 
come first). When you change the basis for W, the new matrix is M~!A. We are working 
with the output space so M~! is on the left (to come last). 

If you change both bases in the same way, the new matrix is M~!AM. The good 
basis vectors are the eigenvectors of A, when the matrix becomes STIAS = A. 


When the basis conitaifis the eigenvectors x1,..., £n, the matrix for T becomes A. 


Reason To find column 1 of the matrix, input the first basis vector xı. The transformation 
multiplies by A. The output is Ax; = A,x,. This is A, times the first basis vector plus 
zero times the other basis vectors. Therefore the first column of the matrix is (A1,0,..., 0). 
In the eigenvector basis, the matrix is diagonal. 


Example 1 Project onto the line y = —x that goes from northwest to southeast. 
The vector (1,0) projects to (.5, —.5) on that line. The projection of (0, 1) is (—.5, .5): 


1. Standard matrix: Project standard basis A= E “3 . 


2. Find the diagonal matrix A in the eigenvector basis. 


Solution The eigenvectors for this projection are x; = (1,—1) and x2 = (1,1). The 
first eigenvector lies on the 135° line and the second is perpendicular (on the 45° line). 
Their projections are x, and 0. The eigenvalues are A; = 1 and Az = 0. 


2. Diagonalized matrix: Project eigenvectors A= | k o . 


3. Find a third matrix B using another basis vı = w, = (2,0) and v = wz = 


(1, 1). 


Solution wz, is not an eigenvector, so the matrix B in this basis will not be diagonal. 
The first way to compute B follows the rule of Section 7.2: 


Find column j of the matrix by writing the projection T (v;) as a combination of w’s. 


Apply the projection T to (2, 0). The result is (1, —1) which is w; — wy. So the first column 
of B contains 1 and —1. The second vector w2 = (1,1) projects to zero, so the second 
column of B contains 0 and 0. The eigenvalues must stay at 1 and 0: 


3. Third similar matrix: Project w; and w2 B= E 4 . (1) 


The second way to find the same B is more insightful. Use W~! and W to change 
between the standard basis and the basis of w’s. Those change of basis matrices are 
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representing the identity transformation! The product of transformations is just ITI. 
The product of matrices is B = W~! AW. This approach shows that B is similar to A. 


For. any: basis Üj: ay Wy- find the matrix Bin three steps. Chai e. 
to the standard basis with W. The matrix in th 
output ‘basis back to: the : ws with WË, Then B: w 


By’ sto w’s = = Wt to w’s Astandard Wu's to standard coon : E 2). 


A change of basis produces a similarity transformation to W~' AW in the matrix. 


Example 2 (continuing with the projection) Apply this W~! AW rule to find B, when 
the basis (2, 0) and (1, 1) is in the columns of W: 


o I1}|-$; ¿jjo 1 -1 0 


The W~! AW rule has produced the same B as in equation (1). The matrices A and B are 
similar. They have the same eigenvalues (1 and 0). And A is similar too. 

Notice that the projection matrix keeps the property A? = A and B? = B and A? = A. 
The second projection doesn’t move the first projection. 


The Singular Value Decomposition (SVD) 


Now the input basis 21,...,U, can be different from the output basis u1,.. ., Um. In fact 
the input space R” can be different from the output space R”. Again the best matrix is 
diagonal (now m by n). To achieve this diagonal matrix 2, each input vector v; must 
transform into a multiple of the output vector u;. That multiple is the singular value oj 
on the main diagonal of ÈX: 


Aou; forj < 
=`) IJU ord =" with orthonormal bases. (3) 
0 for j >r 


The singular values are in the order o) > 02 > -:- > op. The rank r enters because (by 
definition) singular values are not zero. The second part of the equation says that v; is in 
the nullspace for j = r+ 1,...,n. This gives the correct number n — r of basis vectors 
for the nullspace. 

Let me connect the matrices with the linear transformations they represent. A and 
E represent the same transformation. A = USV" uses the standard bases for R” and 
R”. The diagonal © uses the input basis of v’s and the output basis of w’s. The orthog- 
onal matrices V and U give the basis changes; they represent the identity transformations 
(in R” and R”). The product of transformations is ITZ, and it is represented in the 
v and u bases by U~! AV which is £. 
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The matrix Xin thè u and u bases- comes. from A in. the standard bases by ut AV: 


Sy’ stou ‘3 Us standard to u’s Astandard Vo" 's to standard- D 4) 


The SVD > chooses orthonormal bases sn “Li UT and y= = y that t diagonalize A. 


The two orthonormal bases in the SVD are the eigenvector bases for ATA (the v’s) and 
AA! (the u’s). Since those are symmetric matrices, their unit eigenvectors are orthonormal. 
Their eigenvalues are the numbers OF. Equations (10) and (11) in Section 6.7 proved that 
those bases diagonalize the standard matrix A to produce ®©. 


Polar Decomposition 


Every complex number has the polar form re’? A nonnegative number r multiplies a 
number on the unit circle. (Remember that |e? | = |cos@ + i sinô | = 1.) Thinking of 
these numbers as 1 by 1 matrices, r > 0 corresponds to a positive semidefinite matrix 
(call it H) and et? corresponds to an orthogonal matrix Q. The polar decomposition 
extends this factorization to matrices: orthogonal times semidefinite, A = QH. 


Every real. square matrix can be factored into A = OH, where. gi is. orthogonal 
and H is symmetric positive semidefinite. Tf A is invertible, His positive. definite. 


For the proof we just insert VTV = J into the middle of the SVD: 
Polar decomposition A=UZXV'=(UV')\(V=V') =(O)(A). 6) 


The first factor U VT is Q. The product of orthogonal matrices is orthogonal. The second 
factor VEVT is H. It is positive semidefinite because its eigenvalues are in 5. 
If A is invertible then © and H are also invertible. H is the symmetric positive definite 
square root of ATA. Equation (5) says that H? = VD2V? = ATA. 

There is also a polar decomposition A = KQ in the reverse order. Q is the same but 
now K = U EUT. This is the symmetric positive definite square root of AA’. 


Example 3 Find the polar decomposition A = QH from its SVD in Section 6.7: 


aa-| 2 2]2[9 1)[ v2 -1/V2 1/42 —usyl 
—1 1 1 0 2V2 |L 1/2 1/72 ` 
Solution The orthogonal part is Q = U VT. The positive definite part is H = V ZVT. 
This is also H = Q7! A which is QTA because Q is orthogonal: 


omama Eh ollie ial r A 


Positive definite H = W- MA É i| = K- mA 
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In mechanics, the polar decomposition separates the rotation (in Q) from the stretching 
(in H). The eigenvalues of H are the singular values of A. They give the stretching factors. 
The eigenvectors of H are the eigenvectors of ATA. They give the stretching directions 
(the principal axes). Then Q rotates those axes. 

The polar decomposition just splits the key equation Av; = o;u; into two steps. 
The “H” part multiplies v; by o;. The “Q” part swings v; around into u;. 


The Pseudoinverse 


By choosing good bases, A multiplies v; in the row space to give o; 4; in the column space. 
A`! must do the opposite! If Av = ou then A~!u = v/o. The singular values of A7! 
are 1/o, just as the eigenvalues of A~! are 1/4. The bases are reversed. The w’s are in the 
row space of A~!, the v’s are in the column space. 
Until this moment we would have added “if AT! exists” Now we don’t. 
A matrix that multiplies u; to produce v;/o; does exist. It is the pseudoinverse AT: 
—1 T 


. o 
Pseudoinverse 1 


At=yvutyut — Vi- Up Un n’ Ui’ Urt Um 


—1 


nbyn n bym m bym 


The pseudoinverse At is an n by m matrix. If AT! exists (we said it again), then At is the 
same as AT}. In that case m = n = r and we are inverting U EVT to get VEU". The 
new symbol At is needed when r < m orr < n. Then A has no two-sided inverse, but it 
has a pseudoinverse At with that same rank r: 


] , . 
Atu; =—v; fori<r and Atu; =0 fori>r. 
Gi 
The vectors t1,. . ., up in the column space of A go back to v1, ..., u, in the row space. 
The other vectors uy41,. - -Um are in the left nullspace, and At sends them to zero. 


When we know what happens to each basis vector u;, we know At, 

Notice the pseudoinverse &* of the diagonal matrix =. Each ø is replaced by o~!. The 
product X*X is as near to the identity as we can get (it is a projection matrix, 
xt > is partly J and partly 0). We get r 1’s. We can’t do anything about the zero rows and 
columns. This example has c; = 2 and o2 = 3: 


1/2 0 O]f2 0 0 100 1 0 
StZ=| 0 1/3 0 o 3 oļ=|0 1 of=[5 ol 
0 0 0||0 00 000 


The pseudoinverse At is the n by m matrix that makes AAF and AT A into projections: 
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Pseudoinverse 


Atp=xt 


nullspace 
of AT 


I O| row space 
+4 — 
Av A= lo o nullspace 


Figure 7.4: Ax* in the column space goes back to At Axt = x* in the row space. 


SETOR RT MESON a 


AAT = projection matrix onto the column space of A 
At A = projection matrix onto the row space of A 


Pa Wve a LISLE 


Example 4 Find the pseudoinverse of A = f il This matrix is not invertible. The 


rank is 1. The only singular value is v 10. That is inverted to 1/ v10 in XF: 


covou SE) J a -sE 1 


A* also has rank 1. Its column space is the row space of A. When A takes (1, 1) in the row 
space to (4, 2) in the column space, At does the reverse. At (4,2) = (1, 1). 


Every rank one matrix is a column times a row. With unit vectors u and v, that is 
A = ouv”. Then the best inverse of a rank one matrix is At = vu™/o. The product 
AAF is uu’, the projection onto the line through u. The product ATA is vol. 


Application to least squares Chapter 4 found the best solution £ to an unsolvable system 
Ax = b. The key equation is ATAF = ATb, with the assumption that ATA is invertible. 
The zero vector was alone in the nullspace. 

Now A may have dependent columns (rank < n). There can be many solutions to 
ATAF = A™b. One solution is xt = Ab from the pseudoinverse. We can check that 
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A'AA*D, is ATb, because Figure 7.4 shows that e = b — AAtD is the part of b in the 
nulispace of AT. Any vector in the nullspace of A could be added to xt, to give another 
solution £ to ATAF = A"b. But xt will be shorter than any other ¥ (Problem 16): 


The shortest least squares solution to Ax = b is xt = Ab. 


The pseudoinverse A* and this best solution xt are essential in statistics, because experi- 
ments often have a matrix A with dependent columns. 


= REVIEW OF THE KEY IDEAS = 


1. Diagonalization STAS = A is the same as a change to the eigenvector basis. 


2. The SVD chooses an input basis of v’s and an output basis of u’s. Those orthonormal 
bases diagonalize A. This is Av; = o;4;, and in matrix form A = U EVT., 


3. Polar decomposition factors A into QH, rotation U VT times stretching V E VT. 


4. The pseudoinverse At = VEtUT transforms the column space of A back to its 
row space. AT A is the identity on the row space (and zero on the nullspace). 


= WORKED EXAMPLES ~“ 


7.3 A If A has rank n (full column rank) then it has a left inverse C = (A™A)~!Al. 
This matrix C gives CA = I. Explain why the pseudoinverse is At = C in this case. 
If A has rank m (full row rank) then it has a right inverse B with B = A™(AA™)71. 
Then AB = I. Explain why At = B in this case. 

Find B for A, and find C for Az. Find At for all three matrices A1, A2, A3: 


a=| 3 | A,=[2 2] 4s=| 3 > | 


Solution If A has rank n (independent columns) then ATA is invertible—this is a key 
point of Section 4.2. Certainly C = (A™A)~!A! multiplies A to give CA = 1. In the 
opposite order, AC = A(A™A)~'A! is the projection matrix (Section 4.2 again) onto the 
column space. So C meets the requirements to be At: CA and AC are projections. 

If A has rank m (full row rank) then AA? is invertible. Certainly A multiplies B = 
A™(AA™)~! to give AB = 1. In the opposite order, BA = A'(AA™)!A is the projection 
matrix onto the row space. So B is the pseudoinverse At with rank m. 
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The example A, has full column rank (for C) and A> has full row rank (for B): 


_ 1 - 1f2 
A} = (AjA1) ‘Ar = el 2 | AS = AAD = 21 5 I. 


Notice A} A; = [1] and A247 = [1]. But A3 (rank 1) has no left or right inverse. 
Its rank is not full. Its pseudoinverse is AJ = oj wiu? = [11 ]/4. 


Problem Set 7.3 


Problems 1—4 compute and use the SVD of a particular matrix (not invertible). 


1 (a) Compute ATA and its eigenvalues and unit eigenvectors vı and v2. Find 04. 


Rank one matrix A= | ; A 


(b) Compute AA? and its eigenvalues and unit eigenvectors Hı and Hu. 


(c) Verify that Avı = c11. Put numbers into the SVD: 


asort [} Z]=[m l[o] es] 


2 (a) From the z’s and v’s in Problem 1 write down orthonormal bases for the four 
fundamental subspaces of this matrix A. 


(b) Describe all matrices that have those same four subspaces. Multiples of A? 


3 From U, V, and E in Problem 1 find the orthogonal matrix Q = UVT and the 
symmetric matrix H = V EVT. Verify the polar decomposition A = QH. This H 
is only semidefinite because . Test H? = A. 


4 Compute the pseudoinverse At = VEtUT. The diagonal matrix £Y contains 
1/01. Rename the four subspaces (for A) in Figure 7.4 as four subspaces for AT. 
Compute the projections Prow = AŤ A and Poolumn = 447- 


Problems 5—9 are about the SVD of an invertible matrix. 


5 Compute ATA and its eigenvalues and unit eigenvectors vı and v2. What are the 
singular values g1 and o2 for this matrix A? 


s=[2 3]. 
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6 


9 


AAT has the same eigenvalues o? and o? as ATA. Find unit eigenvectors u; and u2. 
Put numbers into the SVD: 


sli tele ol olf of 


In Problem 6, multiply columns times rows to show that A = ouv? + o2qu2v!}. 
Prove from A = U £V" that every matrix of rank r is the sum of r matrices of rank 
one. 


From U, V, and © find the orthogonal matrix Q = U VT and the symmetric matrix 
K = UU". Verify the polar decomposition in reverse order A = KQ. 


The pseudoinverse of this A is the same as because 


Problems 10-11 compute and use the SVD of a 1 by 3 rectangular matrix. 


10 


11 


12 


13 
14 


Compute ATA and AAT and their eigenvalues and unit eigenvectors when the matrix 
is A=[3 4 0]. Whatare the singular values of A? 


Put numbers into the singular value decomposition of A: 
A=[3 4 0]=[ml[or 0 O][er ve v]. 
Put numbers into the pseudoinverse Vt UT of A. Compute AA* and At A: 


1/01 
Pseudoinverse AŤ = = |v v2 v3 0 |[u]. 
0 


What is the only 2 by 3 matrix that has no pivots and no singular values? What is & 
for that matrix? AT is the zero matrix, but what shape? 
If det A = 0 why is det A+ = 0? If A has rank r, why does At have rank r? 


When are the factors in U EVT the same as in QAQ"? The eigenvalues A; must be 
positive, to equal the o;. Then A must be and positive 


Problems 15-18 bring out the main properties of At and xt = A*b. 


15 


All matrices in this problem have rank one. The vector b is (b1, b2). 
[2 2] s 72 4 1 [8 4] m, [5 5 
A=|} i A =(3 | AA -|5 H A ‘=|; z| 
(a) The equation ATAF = ATb has many solutions because ATA is 
(b) Verify that xt = Atb = (.2b, + .1b2,.2b1 + .1b2) solves ATAxt = AT. 
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16 


17 


18 


19 


20 


21 


22 


23 
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(c) Add (1,—1) to that x* to get another solution to ATAF = ATb. Show that 
(|||? = lxt ||? + 2, and x* is shorter. 


The vector x+ = A*b is the shortest possible solution to ATAF = A'b. 
Reason: The difference ¥ —x* is in the nullspace of ATA. This is also the nullspace 
of A, orthogonal to x +. Explain how it follows that |F]? = |x +12 + I£ — x+ ||?. 


Every b in R” is p + e. This is the column space part plus the left nullspace part. 
Every x in R” is x, + xX, = (row space part) + (nullspace part). Then 


AA* p= AAte = At Ax, = _____ AT AX, = 


Find At and At A and AA and x* for this 2 by 1 matrix and these b: 


g-e e-e] 


A general 2 by 2 matrix A is determined by four numbers. If triangular, it is deter- 
mined by three. If diagonal, by two. If a rotation, by one. An eigenvector, by one. 
Check that the total count is four for each factorization of A: 


Four numbersin LU LDU QR UxV"™ SAS™. 


Following Problem 19, check that LDLT and QAQT are determined by three num- 
bers. This is correct because the matrix A is now 


From A = U EVT and At = VEYUT explain these splittings into rank 1: 


r r T r r 
viu! 
A= X o;o] At = 5y AtA = DH AAT = X uu] 
1 1 i I 1 


Challenge Problems 


This problem looks for all matrices A with a given column space in R” and a given 
row space in R”. Suppose c;,...,¢, and b1,...,b, are bases for those two spaces. 
Make them columns of C and B. The goal is to show that A = CMB! for an r by 
r invertible matrix M . Hint: Start from A = UXV". A must have this form: 


The first 7 columns of U and V must be connected to C and B by invertible matrices, 
because they contain bases for the same column space and row space. 


A pair of singular vectors v and u will satisfy Av = ou and ATu = ov. This means 
ul. . . . 
that the double vector x = H is an eigenvector of what symmetric block matrix? 


What is the eigenvalue? The SVD of A is equivalent to the diagonalization of that 
symmetric block matrix. 
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Applications 


8.1 Matrices in Engineering 


This section will show how engineering problems produce symmetric matrices K (often 
K is positive definite). The “linear algebra reason” for symmetry and positive definiteness 
is their form K = ATA and K = ATCA. The “physical reason” is that the expression 
dulK u represents energy—and energy is never negative. The matrix C, often diagonal, 
contains positive physical constants like conductance or stiffness or diffusivity. 


Our first examples come from mechanical and civil and aeronautical engineering. 
K is the stiffness matrix, and KT! f is the structure’s response to forces f from outside. 
Section 8.2 turns to electrical engineering—the matrices come from networks and circuits, 
The exercises involve chemical engineering and I could go on! Economics and manage- 
ment and engineering design come later in this chapter (there the key is optimization). 


Engineering leads to linear algebra in two ways, directly and indirectly: 


Direct way The physical problem has only a finite number of pieces. The laws 
connecting their position or velocity are linear (movement is not too big or too fast). 
The laws are expressed by matrix equations. 


Indirect way The physical system is “continuous”. Instead of individual masses, the 
mass density and the forces and the velocities are functions of x or x, y or x, y, Z. 
The laws are expressed by differential equations. To find accurate solutions we 
approximate by finite difference equations or finite element equations. 


Both ways produce matrix equations and linear algebra. I really believe that you cannot 
do modern engineering without matrices. 


Here we present equilibrium equations Ku = f. With motion, Md7u/dt?+Ku = f 
becomes dynamic. Then we use eigenvalues from Kx = ÀM x, or finite differences. 
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Before explaining the physical examples, may I write down the matrices? The tridi- 
agonal Ko appears many times in this textbook. Now we will see its applications. These 
matrices are all symmetric, and the first four are positive definite: 


2 —l1 Ci + C2 —C2 
Ko = AlAo =| -1 2 —l ATCo Ao = —C2 C2 + C3 —C3 
—] 2 —C3 C3 + C4 
Fixed-fixed Spring constants included 
2 -1 cy +02 —C2 
Kı = ATA, =| —ł 2 —l ATC Ay = —C2 C2 +3 «C3 
—] 1 —C3 C3 
Fixed-free Spring constants included 
1 -1 2 -1 -i 
K singular — -1 2 -l K circular = | 71 2 ~! 
-1 1 —1 -1 2 
Free-free 


The matrices Ko, Ki, Ksingular» and Keircular have C = 7 for simplicity. This means 
that all the “spring constants” are c; = 1. We included ATCoAo and ATC: A, to show how 
the spring constants enter the matrix (without changing its positive definiteness). Our first 
goal is to show where these stiffness matrices come from. 


A Line of Springs 


Figure 8.1 shows three masses m4, m2, m3 connected by a line of springs. One case has 
four springs, with top and bottom fixed. The fixed-free case has only three springs; the 
lowest mass hangs freely. The fixed-fixed problem will lead to Ko and AlCoAo. The 
fixed-free problem will lead to K, and AIC 141. A free-free problem, with no support at 
either end, produces the matrix Ksingular- 

We want equations for the mass movements u and the tensions (or compressions) y: 


u (u1, u2,43) = movements of the masses (down or up) 
y = (Yı, Y2, Y3, Y4) or (y1, y2, y3) = tensions in the springs 


When a mass moves downward, its displacement is positive (u; > 0). For the springs, 
tension is positive and compression is negative (y; < 0). In tension, the spring is stretched 
so it pulls the masses inward. Each spring is controlled by its own Hooke’s Law y = ce: 
(stretching force) = (spring constant) times (stretching distance). 

Our job is to link these one-spring equations y = ce into a vector equation Ku = f 
for the whole system. The force vector f comes from gravity. The gravitational constant 
g will multiply each mass to produce forces f = (12, m2g, m38). 
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fixed end ug = 0 fixed end 
spring c1 tension yı spring cı 
mass mı movement uy mass mı 
C2 y2 spring c2 
m2 u2 mass m2 
C3 y3 spring c3 
mz u3 mass M3 
C4 v4 free end 
fixed end u4= 0 


Figure 8.1: Lines of springs and masses: fixed-fixed and fixed-free 
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ug = 0 
tension yı 


movement uy 


tension y2 


movement uz 


tension y3 


movement u3 


yg =0 


ends. 


The real problem is to find the stiffness matrix (fixed-fixed and fixed-free). The best 
way to create K is in three steps, not one. Instead of connecting the movements u; directly 


to the forces, it is much better to connect each vector to the next in this list: 


u = Movements ofn masses = (uy,,...,Un) 
e = _ Elongations ofm springs = (€1,...,€m) 
y = Internal forces in m springs = ()1,.--,¥m) 
f = External forces onn masses = (fj,..., fn) 
The framework that connects u to e to y to f looks like this: 
[u] e = Au A ism byn 
Al [AT y=Ce C is m bym 
C 
[e] — f =Ay AT is n by m 


We will write down the matrices A and C and AT for the two examples, first with fixed 
ends and then with the lower end free. Forgive the simplicity of these matrices, it is their 


form that is so important. Especially the appearance of A together with AT. 


The elongation e is the stretching distance—how far the springs are extended. Orig- 
inally there is no stretching—the system is lying on a table. When it becomes vertical 
and upright, gravity acts. The masses move down by distances uy, u2, u3. Each spring is 
stretched or compressed by e; = u; — uj_1, the difference in displacements of its ends: 


First spring: €) = uy (the top is fixed so ug = 0) 
Stretching of Second spring: e2 = U2 — Uj 
each spring Third spring: €3 = U3 — U2 

Fourth spring: e4,=  — u3 (the bottom is fixed so u4 = 0) 
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If both ends move the same distance, that spring is not stretched: uj; = ui—1 and e; = 0. 
The matrix in those four equations is a 4 by 3 difference matrix A, and e = Au: 


Stretching “1 4 A y Uy 
distances e = Au is in = 0 1 l uz |. (1) 
* 3 = 
(elongations) es 0 0 U3 


The next equation y = Ce connects spring elongation e with spring tension y. This is 
Hooke’s Law y; = c;e; for each separate spring. It is the “constitutive law” that depends 
on the material in the spring. A soft spring has small c, so a moderate force y can produce 
a large stretching e. Hooke’s linear law is nearly exact for real springs, before they are 
overstretched and the material becomes plastic. 

Since each spring has its own law, the matrix in y = Ce is a diagonal matrix C: 


Hooke’s Yi. = Cei Yı Ci €i 
= Ce . c e 
Law J2 2€2 is y2 | _ 2 2 (2) 
y=Ce Y3 = €3€3 y3 C3 e3 
Yq = €4€4 y4 C4 e4 


Combining e = Au with y = Ce, the spring forces are y = CAu. 

Finally comes the balance equation, the most fundamental law of applied mathematics. 
The internal forces from the springs balance the external forces on the masses. Each mass 
is pulled or pushed by the spring force y; above it. From below it feels the spring force 
yj+i plus f; from gravity. Thus y; = yj+1 + fj or fj = yy — Yj+: 


Force fi = yi»: fi I —i 0 90 yı 
balance f, = Yy2—7y3 is | fo | =} O 1 -l 0 y2 (3) 
f=Ay f3 = y3=y4 f 0o 0 l ya 


That matrix is AT. The equation for balance of forces is f = A‘ y. Nature transposes the 
rows and columns of the e — u matrix to produce the f — y matrix. This is the beauty of 
the framework, that AT appears along with A. The three equations combine into Ku = f, 
where the stiffness matrix is K = ATCA: 


e = “Au 
y = Ce combine into A'CAu = f or Ku=/f. 
f = Ay 


In the language of elasticity, e = Au is the kinematic equation (for displacement). The 
force balance f = ATy is the static equation (for equilibrium). The constitutive law is 
y = Ce (from the material). Then ATCA is n by n = (n by m)(m by m)(m by n). 
Finite element programs spend major effort on assembling K = ATCA from thousands 
of smaller pieces. We find K for four springs (fixed-fixed) by multiplying AT times CA: 


C1 0 0 


1 -l 0 B cite C2 0 
0 1 -l 0 f 2 À = —C2 C2 +3 —C3 
0 0 1-1 3 3 0 


0 0 —c4 —C3 C3 + C4 
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If all springs are identical, with c1 = c2 = c3 = cg = 1, then C = J. The stiffness matrix 
reduces to ATA. It becomes the special —1,2, —1 matrix: 


2-1 0 
With C = I Ko =Aldo=| -1 2 -1 |. (4) 
0 -1 2 


Note the difference between ATA from engineering and LLT from linear algebra. The 
matrix A from four springs is 4 by 3. The triangular matrix L from elimination is square. 
The stiffness matrix K is assembled from ATA, and then broken up into LL". One step 
is applied mathematics, the other is computational mathematics. Each K is built from 
rectangular matrices and factored into square matrices. 

May I list some properties of K = A™CA? You know almost all of them: 


1. K is tridiagonal, because mass 3 is not connected to mass 1. 

2. K is symmetric, because C is symmetric and A‘ comes with A. 

3. K is positive definite, because c; > 0 and A has independent columns. 
4. K`! isa full matrix in equation (5) with all positive entries. 


That last property leads to an important fact about u = K`! f : If all forces act downwards 
(f; > 0) then all movements are downwards (u; > 0). Notice that “positiveness” is 
different from “positive definiteness”. Here K~! is positive (K is not). Both K and K7! 
are positive definite. 


Example 1 Suppose all c; = c and m; = m. Find the movements u and tensions y. 
All springs are the same and all masses are the same. But all movements and elonga- 
tions and tensions will not be the same. K~! includes 1 because ATCA includes c: 


i 3 2 1 mg m 3/2 
u=Kf=>—|2 42|] m |= “] 2 (5) 
4c} 1 2 3 || mg e | 3/2 


The displacement u2, for the mass in the middle, is greater than u; and u3. The units are 
correct: the force mg divided by force per unit length c gives a length u. Then 


1 0 0 3 3/2 
a, _ |-1 1 0 | mg mg 1/2 
ea au=t o -1 ajo? z | -1/2 
0 0 -i 3 -3/2 


Those elongations add to zero because the ends of the line are fixed. (The sum uy + (u2 — 
ui) + (u3 — u2) + (—u3) is certainly zero.) For each spring force y; we just multiply e; by 
c. So y1, Y2, Y3, Y4 are 5mg, =mg, —4meg, —2ng. The upper two springs are stretched, 
the lower two springs are compressed. 

Notice how u,e, y are computed in that order. We assembled K = ATCA from rect- 
angular matrices. To find u = KT! f, we work with the whole matrix and not its three 
pieces! The rectangular matrices A and AT do not have (two-sided) inverses. 
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The three matrices are mixed together by ATCA, and they cannot easily be untangled. 
In general, A'y = f has many solutions. And four equations Au = e would usually 
have no solution with three unknowns. But A'CA gives the correct solution to all three 
equations in the framework. Only when m = n and the matrices are square can we go from 
y = (ADTI f toe = C7! y to u = A-!e. We will see that now. 


Fixed End and Free End 


Remove the fourth spring. All matrices become 3 by 3. The pattern does not change! The 
matrix A loses its fourth row and (of course) AT loses its fourth column. The new stiffness 
matrix K, becomes a product of square matrices: 


1-1 0 ci 1 0 0 
AiC(i\4,=|0 1 -I c2 -1 1 0 
0 oO 1 C3 0-1 1 


The missing column of AT and row of A multiplied the missing c4. So the quickest way to 
find the new ATCA is to set cq = 0 in the old one: 


Cy +2 —C2 0 
FIXED 
FREE Kk, = ATCA: = —C2 C2 +¢3  —C3 . (6) 
0 —C3 C3 


If c1 = c2 = c3 = l and C = J, this is the —1, 2, —1 tridiagonal matrix, except the last 
entry is 1 instead of 2. The spring at the bottom is free. 


Exempte 2 Allc; = c andall m; = m in the fixed-free hanging line of springs. Then 


2-1 0 [111 
Kı=c| -1 2 -1 and K =-| 1 2 2 
0-1 1 fl 1 2 3 


The forces mg from gravity are the same. But the movements change from the previous 
example because the stiffness matrix has changed: 


1 1 1 1 mg g 
1 2 3 mg 6 


Those movements are greater in this fixed-free case. The number 3 appears in u because 
all three masses are pulling the first spring down. The next mass moves by that 3 plus an 
additional 2 from the masses below it. The third mass drops even more (3 + 2+ 1 = 6). 
The elongations e = Au in the springs display those numbers 3, 2, 1: 


1 00 3 3 
e=|-1 10/15 |={™8] 2 
i 


0-1 11!1° 1 6 c 
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Multiplying by c, the forces y in the three springs are 3mg and 2mg and mg. And the 
special point of square matrices is that y can be found directly from f! The balance 
equation ATy = f determines y immediately, because m = n and AT is square. We are 
allowed to write (A'CA)"! = AT! C7E(AT)-!; 


1 1 i mg 3mg 
y=(A")'f is | 0 1 1 mg | =| 2mg 
00 I mg Img 


Two Free Ends: K is Singular 


The first line of springs in Figure 8.2 is free at both ends. This means trouble (the whole 
line can move). The matrix A is 2 by 3, short and wide. Here ise = Au: 


e uz — u —| 1 0 “i 
FREE-FREE 1 j| 2r j uz |. 
é2 U3 — U2 0 -1 1 u 
3 


Now there is a nonzero solution to Au = 0. The masses can move with no stretching of 
the springs. The whole line can shift by u = (1,1, 1) and this leaves e = (0,0). A has 
dependent columns and the vector (1, 1, 1) is in its nullspace: 


Í 
-1 1 0 0 : 
Au = | 0 -i 1l | i = l 0 | = no stretching. (8) 


Au = 0 certainly leads to ATCAu = 0. So ATCA is only positive semidefinite, without c4 
and c4. The pivots will be cz and c3 and no third pivot. The rank is only 2: 


—] 0 C2 —Cc2 0 
1 —1 c2 c | | ) I | = —Cz l2 +3 —C3 (9) 
0O 1 3 0 -¢3 C3 


Two eigenvalues will be positive but x = (1,1, 1) is an eigenvector for A = 0. We can 
solve A'CAu = f only for special vectors f . The forces have to add to fi + fot f3 = 0, 
or the whole line of springs (with both ends free) will take off like a rocket. 


Circle of Springs 


A third spring will complete the circle from mass 3 back to mass 1. This doesn’t make K 
invertible—the new matrix is still singular. That stiffness matrix Keircular 18 not tridiag- 
onal, but it is symmetric (always) and semidefinite: 


1-1 0 1 0 =i 2-1 -1 
Av reular4cireular =| 0 1-1//-1 1 #O}=J-1 2-1]. ağ 
-1 oO 1 0 —1 1 -1 -1 2 


The only pivots are 2 and 3, The eigenvalues are 3 and 3 and 0. The determinant is zero. 
The nullspace still contains x = (1,1,1), when all the masses move together. 
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mass mı movement u 1 mass Mı movement uy 
spring c2 tension y2 spring C2 spring cı 
mass #712 movement uz mass M2 movement uz 
spring c3 tension y3 spring c3 

mass 113 movement 13 mass m3 movement u3 


Figure 8.2: Free-free ends: A line of springs and a “circle” of springs: Singular K’s. 
The masses can move without stretching the springs so Au = 0 has nonzero solutions. 


This movement vector (1, 1, 1) is in the nullspace of Agjreutar and Keireular, even after 
the diagonal matrix C of spring constants is included: the springs are not stretched. 


cı +2 —C2 Cy 
(A'CA) circular = —CQ C2 + C3 —C3 . (11) 
—C| —C3 c3 +€] 


Continuous Instead of Discrete 


Matrix equations are discrete. Differential equations are continuous. We will see the dif- 
ferential equation that corresponds to the tridiagonal —1,2,—1 matrix ATA. And it is a 
pleasure to see the boundary conditions that go with Ko and K1. 

The matrices A and A" correspond to the derivatives d/dx and —d/dx! Remember 
that e = Au took differences u; — uj_,, and f = A'y took differences y; — y;41. Now 
the springs are infinitesimally short, and those differences become derivatives: 

Ui — ui- . = a Yi— Yii . dy 

— is like ax Ay is like — Ix 

The factor Ax didn’t appear earlier—we imagined the distance between masses was 1. To 
approximate a continuous solid bar, we take many more masses (smaller and closer). Let 
me jump to the three steps A, C, A! in the continuous model, when there is stretching and 
Hooke’s Law and force balance at every point x: 


du d 
ex) = Au = yx) = e@e@) Ay=-2 = fo) 
Combining those equations into A'CAu(x) = f(x), we have a differential equation not a 
matrix equation. The line of springs becomes an elastic bar: 
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ATA corresponds to a second derivative. A is a “difference matrix” and ATA is a “second 
difference matrix”. The matrix has —1,2,—1 and the equation has —d?u/dx?: 


2 


. a u . . . 
—Ui+i + 2u; — uj-; is a second difference -Z is a second derivative. 
x 
Now we see why this symmetric matrix is a favorite. When we meet a first derivative 
du/dx, we have three choices (forward, backward, and centered differences): 


du — Ux + Ax) ~ u(x) or u(x) —u(x — Ax) or u(x + Ax)— u(x — Ax) 


dx Ax Ax 2Ax 


When we meet d*u/dx*, the natural choice is u(x + Ax) — 2u(x) + u(x — Ax), divided 
by (Ax)?. Why reverse these signs to —1,2, —1? Because the positive definite matrix has 
+2 on the diagonal. First derivatives are antisymmetric; the transpose has a minus sign. 
So second differences are negative definite, and we change to —d?u/dx?. 

We have moved from vectors to functions. Scientific computing moves the other way. 
It starts with a differential equation like (12). Sometimes there is a formula for the solution 
u(x), more often not. In reality we create the discrete matrix K by approximating the 
continuous problem. Watch how the boundary conditions on u come in! By missing uo we 
treat it (correctly) as zero: 


1 0 or, 

FIXED 1] -4 1 0 ug =0 

FIXED 3K 0 -1 1 u2 |X T with =0 0” 
0 0 -l 3 


Fixing the top end gives the boundary condition ug = 0. What about the free end, when 
the bar hangs in the air? Row 4 of A is gone and so is u4. The boundary condition must 
come from A’. It is the missing y4 that we are treating (correctly) as zero: 


1 -l 0 yi 
1 —i yo | ®& 2. with =0 


FIXED ry ug =0 


FREE = Ay (14) 


The boundary condition y4 = 0 at the free end becomes du/dx = 0, since y = Au 
corresponds to du/dx. The force balance Aly = f at that end (in the air) is 0 = 0. The 
last row of Kiu = f has entries —1, 1 to reflect this condition du/dx = 0. 

May I summarize this section? I hope this example will help you turn calculus into 
linear algebra, replacing differential equations by difference equations. If your step Ax is 
small enough, you will have a totally satisfactory solution. 


The equationis — < (co) = f(x) with u(0) = 0 and wc or n] =0 


Divide the bar into N pieces of length Ax. Replace du/dx by Au and —dy/dx by AT y. 
Now A and AT include 1/Ax. The end conditions are ug = 0 and [uy = 0 or yy = 0]. 
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The three steps —d/dx and c(x) and d/dx correspond to AT and C and A: 
f =A'ty and y=Ce and e = Au give A'CAu= f. 


This is a fundamental example in computational science and engineering. Our book con- 
centrates on Step 3 in that process (linear algebra). Now we have taken Step 2. 


1. Model the problem by a differential equation 

2. Discretize the differential equation to a difference equation 

3. Understand and solve the difference equation (and boundary conditions!) 
4. Interpret the solution; visualize it; redesign if needed. 


Numerical simulation has become a third branch of science, together with experiment and 
deduction. Designing the Boeing 777 was much less expensive on a computer than in a 
wind tunnel. Our discussion still has to move from ordinary to partial differential equations, 
and from linear to nonlinear. 

The texts Introduction to Applied Mathematics and Computational Science and Engi- 
neering (Wellesley-Cambridge Press) develop this whole subject further—see the course 
page math.mit.edu/18085 with video lectures (also on ocw.mit.edu). The principles re- 
main the same, and I hope this book helps you to see the framework behind the computa- 
tions. 


Problem Set 8.1 


1 Show that det ATCo Ao = c1Cac3+¢103C4 +C1C2€4 +C2C3C4. Find also det ATC, Ay 
in the fixed-free example. 
2 Invert ATC; A, in the fixed-free example by multiplying ATCT (Aq) 1. 


3 In the free-free case when ATCA in equation (9) is singular, add the three equations 
A'CAu = f to show that we need fı + f2 + J3 = 0. Finda solution to ATCAu = 
f when the forces f = (—1,0, 1) balance themselves. Find all solutions! 


4 Both end conditions for the free-free differential equation are du/dx = 0: 


dx 


Integrate both sides to show that the force f(x) must balance itself, f f(x) dx = 0, 
or there is no solution. The complete solution is one particular solution u(x) plus 
any constant. The constant corresponds to u = (1,1, 1) in the nullspace of ATCA. 


d 
—— (coo) = f(x) with a = 0 at both ends. 
x 


5 In the fixed-free problem, the matrix A is square and invertible. We can solve AT y = 
f separately from Au = e. Do the same for the differential equation: 


Solve -2 = f(x) with y(1)=0. Graph y(x) if f(x)=1. 
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6 


10 


11 


12 


The 3 by 3 matrix Kı = ATC A, in equation (6) splits into three “element matrices” 
c1 EF, + coE2 + c3E3. Write down those pieces, one for each c. Show how they 
come from column times row multiplication of AIC 141. This is how finite element 
stiffness matrices are actually assembled. 


For five springs and four masses with both ends fixed, what are the matrices A and 
C and K? With C = I solve Ku = ones(4). 


Compare the solution u = (u1, u2, u3, u4) in Problem 7 to the solution of the con- 
tinuous problem —u” = 1 with u(0) = 0 and u(1) = 0. The parabola u(x) should 


correspond at x = i, 2, 3, $ to u—is there a (Ax)? factor to account for? 


Solve the fixed-free problem —u” = mg with u(0) = 0 and u’(1) = 0. Compare 
12 3 


u(x) atx = 3, 5, 3 with the vector u = (3mg, Smg, 6mg) in Example 2. 
Suppose cı = ¢€z = C3 = C4 = 1, m, = 2 and mz = m3 = 1. Solve ATCA u = 
(2, 1, 1) for this fixed-fixed line of springs. Which mass moves the most (largest u) ? 


(MATLAB) Find the displacements u(1),...,4(100) of 100 masses connected by 
springs all with c = 1. Each force is f(i) = .01. Print graphs of u with fixed-fixed 
and fixed-free ends. Note that diag(ones(n, 1), d) is a matrix with n ones along 
diagonal d. This print command will graph a vector u: 


plot(u,’+’);  xlabel( mass number’); ylabei( movement’); print 


(MATLAB) Chemical engineering has a first derivative du/dx from fluid velocity as 
well as d?°u/dx? from diffusion. Replace du/dx by a forward difference, then a 
centered difference, then a backward difference, with Ax = ż. Graph your three 
numerical solutions of 

d’u du , 
This convection-diffusion equation appears everywhere. It transforms to the 
Black-Scholes equation for option prices in mathematical finance. 


Problem 12 is developed into the first MATLAB homework in my 18.085 course on 
Computational Science and Engineering at MIT. Videos on ocw.mit.edu. 
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8.2 Graphs and Networks 


Over the years I have seen one model so often, and I found it so basic and useful, that I 
always put it first. The model consists of nodes connected by edges. This is called a graph. 
Graphs of the usual kind display functions f(x). Graphs of this node-edge kind lead 
to matrices. This section is about the incidence matrix of a graph—which tells how the » 
nodes are connected by the m edges. Normally m > n, there are more edges than nodes. 


For any m by n matrix there are two fundamental subspaces in R” and two in R”. They 
are the row spaces and nullspaces of A and AT. Their dimensions are related by the most 
important theorem in linear algebra. The second part of that theorem is the orthogonality of 
the subspaces. Our goal is to show how examples from graphs illuminate the Fundamental 
Theorem of Linear Algebra. 

We review the four subspaces (for any matrix). Then we construct a directed graph and 
its incidence matrix. The dimensions will be easy to discover. But we want the subspaces 
themselves—this is where orthogonality helps. It is essential to connect the subspaces to 
the graph they come from. By specializing to incidence matrices, the laws of linear algebra 
become Kirchhoff’s laws. Please don’t be put off by the words “current” and “voltage” and 
“Kirchhoff.” These rectangular matrices are the best. 


Every entry of an incidence matrix is 0 or 1 or —1. This continues to hold during 
elimination. All pivots and multipliers are +1. Therefore both factors in A = LU also 
contain 0, 1, —1. So do the nullspace matrices! All four subspaces have basis vectors with 
these exceptionally simple components. The matrices are not concocted for a textbook, 
they come from a model that is absolutely essential in pure and applied mathematics. 


Here is a first incidence matrix. Notice —1 and 1 in each row. This matrix takes 
differences in voltage, across six edges of a graph. The voltages are x1, x2, X3, X4 at the 
four nodes in Figure 8.4—where we will construct this matrix A. Its echelon form is U: 


-1 1 0 0 —-1 1 0 Q 
Incidence -1 0 1 0 0-1 1 0 
matrix 0-1 1 0 0 0-1 1 
6 edges =| o o 1| Pest V=| ð 9 o 9 
4 nodes 0-1 0 1 0 0 0 O 
0 0-1 1 0 0 0 0 


The nullspace of A and U is the line through x = (1, 1, 1, 1). The column spaces of A and 
U have dimension r = 3. The pivot rows are a basis for the row space. 

Figure 8.3 shows more—the subspaces are orthogonal. Every vector in the nullspace is 
perpendicular to every vector in the row space. This comes directly from the m equations 
Ax = 0. For A and U above, x = (1, 1, 1, 1) is perpendicular to all rows and thus to the 
whole row space. Equal voltages produce no current! 


I would like to review the Four Fundamental Subspaces before using them. The 
whole point will be to see their meaning on the network. 
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dim r 


column 
space 


nullspace 
nullspace of AT 


of A 
dimn r 


Figure 8.3: Big picture: The four subspaces with their dimensions and orthogonality. 


Start with an m by n matrix. Its columns are vectors in R”. Their linear combinations 
produce the column space C(A), a subspace of R”. Those combinations are exactly the 
matrix-vector products Áx. 

The rows of A are vectors in R” (or they would be, if they were column vectors). Fheir 
linear combinations produce the row space. To avoid any inconvenience with rows, we 
transpose the matrix. The row space becomes C(AT), the column space of AT. 

The central questions of linear algebra come from these two ways of looking at the 
same numbers, by columns and by rows. 

The nullspace N (A) contains every x that satisfies Ax = 0—this is a subspace of R”. 
The “left” nullspace contains all solutions to ATy = 0. Now y has m components, and 
N(A‘) is a subspace of R”. Written as y'A = 0', we are combining rows of A to produce 
the zero row. The four subspaces are illustrated by Figure 8.3, which shows R” on one side 
and R” on the other. The link between them is A. 

The information in that figure is crucial. First come the dimensions, which obey the 
two central laws of linear algebra: 


When the row space has dimension r, the nullspace has dimension n — r. Elimination 
leaves these two spaces unchanged, and the echelon form U gives the dimension count. 
There are r rows and columns with pivots. There are n — r free columns without pivots, 
and those lead to vectors in the nullspace. 
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This review of the subspaces applies to any matrix A—only the example was special. 
Now we concentrate on that example. It is the incidence matrix for a particular graph, and 
we look to the graph for the meaning of every subspace. 


Directed Graphs and Incidence Matrices 


Figure 8.4 displays a graph with m = 6 edges and n = 4 nodes, so the matrix A is 6 by 
4, It tells which nodes are connected by which edges. The entries —1 and +1 also tell 
the direction of each arrow (this is a directed graph). The first row —1, 1,0,0 of A gives a 
record of the first edge from node 1 to node 2: 


node 
-1 1 0 0 1 
-1 0 1 0 2 
A= 0-1 1 0 3 edge 
-1 0 0 1 4 
0-1 0 1 5 
0 O-!1 1 6 


Figure 8.4a: Complete graph with m = 6 edges and n = 4 nodes. 


Row numbers are edge numbers, column numbers are node numbers. 
You can write down A immediately by looking at the graph. 


The second graph has the same four nodes but only three edges. Its incidence matrix is 
3 by 4: 


© 


Figure 8.4b: Tree with 3 edges and 4 nodes and no loops. 


The first graph is complete—every pair of nodes is connected by an edge. The second graph 
is a free—the graph has no closed loops. Those graphs are the two extremes, the maximum 
number of edges is n(n — 1) and the minimum (a tree) is m =n — 1. 
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The rows of B match the nonzero rows of U—the echelon form found earlier. Elimi- 
nation reduces every graph to a tree. The loops produce zero rows in U. Look at the loop 
from edges 1, 2, 3 in the first graph, which leads to a zero row: 


-l1 1 0 0 —1 1 0O 0 —1 1 oO 0 
-1 0 1 0| — 0 -l 1 0j — 0 —i 1 0 
0-1 1 0 0 -=l 1 0 0 0 0 0 


Those steps are typical. When two edges share a node, elimination produces the “shortcut 
edge” without that node. If the graph already has. this shortcut edge, elimination gives a 
row of zeros. When the dust clears we have a tree. 

An idea suggests itself: Rows are dependent when edges form a loop. independent 
rows come from trees. This is the key to the row space. We are assuming that the graph 
is connected, and it makes no fundamental difference which way the arrows go. On each 
edge, flow with the arrow is “positive.” Flow in the opposite direction counts as negative. 
The flow might be a current or a signal or a force—or even oil or gas or water. 

For the column space we look at Ax, which is a vector of differences: 


O f oO i . X2 — Xi 
. ` = Q' ia X3 — Xj 
pi an OT X3—X2 

a a e X4 — X2 


X4 — X3 


The unknowns x1, x2, X3, X4 represent potentials or voltages at the nodes. Then Ax gives 
the potential differences or voltage differences across the edges. It is these differences 
that cause flows. We now examine the meaning of each subspace. 


1 The nullspace contains the solutions to Ax = 0. All six potential differences are zero. 
This means: All four potentials are equal. Every x in the nullspace is a constant vector 
(c,c,c,c). The nullspace of A is a line in R”—its dimension is n — r = 1. 

The second incidence matrix B has the same nullspace. It contains (1, 1, 1, 1): 


“1 1 0 0 ; 0 
Bx=| 0-1 1 Of|)]=]o 
o o-1 ajj; 0 


We can raise or lower all potentials by the same amount c, without changing the dif- 
ferences. There is an “arbitrary constant” in the potentials. Compare this with the same 
statement for functions. We can raise or lower f(x) by the same amount C , without chang- 
ing its derivative. There is an arbitrary constant C in the integral. 

Calculus adds “+C” to indefinite integrals. Graph theory adds (c,c,c,c) to the vector 
x of potentials. Linear algebra adds any vector x,, in the nullspace to one particular solution 
of Ax =b. 
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The “+C” disappears in calculus when the integral starts at a known point x = a. 
Similarly the nullspace disappears when we set x4 = 0. The unknown x4 is removed and 
so are the fourth columns of A and B. Electrical engineers would say that node 4 has been 
“grounded.” 


2 The row space contains all combinations of the six rows. Its dimension is certainly not 
six. The equation r + (n — r) = n must be 3 + 1 = 4. The rank is r = 3, as we 
also saw from elimination. After 3 edges, we start forming loops! The new rows are not 
independent. 

How can we tell if v = (v1, v2, v3, v4) is in the row space? The slow way is to combine 
rows. The quick way is by orthogonality: 


v is in the row space if and only if it is perpendicular to (1, 1,1, 1) in the nullspace. 


The vector v = (0, 1,2, 3) fails this test—its components add to 6. The vector (—6, 1,2,3) 
passes the test. It lies in the row space because its components add to zero. It equals 
6(row 1) + 5(row 3) + 3(row 6). 

Each row of A adds to zero. This must be true for every vector in the row space. 


3 The column space contains all combinations of the four columns. We expect three in- 
dependent columns, since there were three independent rows. The first three columns are 
independent (so are any three). But the four columns add to the zero vector, which says 
again that (1, 1, 1,1) is in the nullspace. How can we tell if a particular vector b is in the 
column space of an incidence matrix? 


First answer Try to solve Ax = 6. That misses all the insight. As before, orthogonal- 
ity gives a better answer. We are now coming to Kirchhoff’s two famous laws of circuit 
theory—the voltage law and current law. Those are natural expressions of “laws” of linear 
algebra. It is especially pleasant to see the key role of the left nullspace. 


Second answer Ax is the vector of differences in equation (1). If we add differences 
around a closed loop in the graph, the cancellation leaves zero. Around the big triangle 
formed by edges 1,3, —2 (the arrow goes backward on edge 2) the differences cancel: 


Voltage Law (x2 — x1) + (x3 — x2) — (x3 — xı) = 0. 


The components of Ax add to zero around every loop. When b is in the column space of 
A, it must obey the same law: 


Kirchhoff’s Law: by + by by = 0. 
By testing each loop, we decide whether b is in the column space. Ax = b can be solved 


exactly when the components of b satisfy all the same dependencies as the rows of A. Then 
elimination leads to 0 = 0, and Ax = b is consistent. 
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4 The left nuilspace contains the solutions to AT y = 0. Its dimension is m — r = 6 — 3: 


yı 
-1 -1 0-1 0 olly 0 
Current yr. _] 1 0—1 O -1 OF} yx] _ 40 
Law (KCL) 47=] 0 1 1 0 0 -tllyl=lol 2 
0 0 0O 1 1 =I) 4 ys 0 
Y6 


The true number of equations is r = 3 and not n = 4. Reason: The four equations add to 
0 = 0. The fourth equation follows automatically from the first three. 

What do the equations mean? The first equation says that —y; — y2 — y4 = 0. The net 
flow into node 1 is zero. The fourth equation says that y4 + ys + ye = 0. Flow into the 
node minus flow out is zero. The equations A’ y = 0 are famous and fundamental: 


Kirchhoff” Current Law: Flow in equals flow out at each node. 


This law deserves first place among the equations of applied mathematics. It expresses 
“conservation” and “continuity” and “balance.” Nothing is lost, nothing is gained. When 
currents or forces are in equilibrium, the equation to solve is Ay = 0. Notice the beautiful 
fact that the matrix in this balance equation is the transpose of the incidence matrix A. 

What are the actual solutions to ATy = 0? The currents must balance themselves. 
The easiest way is to flow around a loop. If a unit of current goes around the big triangle 
(forward on edge 1, forward on 3, backward on 2), the vector is y = (1,—1,1,0,0,0). 
This satisfies A’ y = 0. Every loop current is a solution to the Current Law. Around the 
loop, flow in equals flow out at every node. A smaller loop goes forward on edge 1, forward 
on 5, back on 4. Then y = (1, 0,0, —1, 1, 0) is also in the left nullspace. 

We expect three independent y’s, since 6 — 3 = 3. The three small loops in the graph 
are independent. The big triangle seems to give a fourth y, but it is the sum of flows around 
the small loops. The small loops give a basis for the left nullspace. 


1 

1 2 1 0 0 i 

0 0 i 1 

0 1 olla 

1il olla llo 

1 1 0 0 

2 3 | 0 1 i 0 
3 


small loops big loop 
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Summary The incidence matrix A comes from a connected graph with n nodes and m | 
edges. Thé row space and column. ‘spacé have. dimensions a- l. “The nullspaces « of A ' 
and AY haye: dimension landm=n+ I: ve 7 | 

i 1 The constant vectors (CACI c) make up the nullspace of A. 


2 ‘There are r= = n> = -1 independent rows, using edges from any tree. 


| B 3 Voltage law: The components: of Ax add: to Zero: around every: loop. 


4 Current. law: AN y= -Oi is. solved by loop. currents. N (A?) has: dimension m= 7: 
, There are m =r: =m- -A t+ 1 independent loops in-the graph. Dono 


For every graph in a plane, linear algebra yields Euler’s formula: 
(number of nodes) — (number of edges) + (number of small loops) = 1. 


This is n — m + (m — n + 1) = 1. The graph in our example has 4—6+ 3 = 1. 
A single triangle has (3 nodes) — (3 edges) + (1 loop). On a 10-node tree with 9 edges 
and no loops, Euler’s count is 10 — 9 + 0. All planar graphs lead to the answer 1. 


Networks and A'CA 


In a real network, the current y along an edge is the product of two numbers. One number 
is the difference between the potentials x at the ends of the edge. This difference is Ax and 
it drives the flow. The other number is the “conductance” c—which measures how easily 
flow gets through. 

In physics and engineering, c is decided by the material. For electrical currents, c 
is high for metal and low for plastics. For a superconductor, ¢ is nearly infinite. If we 
consider elastic stretching, c might be low for metal and higher for plastics. In economics, 
c measures the capacity of an edge or its cost. 

To summarize, the graph is known from its “connectivity matrix” A. This tells the 
connections between nodes and edges. A network goes further, and assigns a conductance c 
to each edge. These numbers cj,...,Cm go into the “conductance matrix” C—which is 
diagonal. 


For a network of resistors, the conductance is c = 1/(resistance). In addition to Kirch- 
hoff’s Laws for the whole system of currents, we have Ohm’s Law for each particular 
current. Ohm’s Law connects the current yı on edge 1 to the potential difference x2 — x1 
between the nodes: 


Ohm’s Law: Current along edge = conductance times potential difference. 


Ohm’s Law for all m currents is y = —CAx. The vector Ax gives the potential differences, 
and C multiplies by the conductances. Combining Ohm’s Law with Kirchhoff’s Current 
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Law ATy = 0, we get A'CAx = 0. This is almost the central equation for network 
flows. The only thing wrong is the zero on the right side! The network needs power from 
outside—a voltage source or a current source—to make something happen. 


Note about signs In circuit theory we change from Ax to —Ax. The flow is from higher 
potential to lower potential. There is (positive) current from node 1 to node 2 when x; — x2 
is positive—whereas Ax was constructed to yield x2 — xı. The minus sign in physics and 
electrical engineering is a plus sign in mechanical engineering and economics. Ax versus 
—Ax is a general headache but unavoidable. 


Note about applied mathematics Every new application has its own form of Ohm’s law. 
For elastic structures y = CAx is Hooke’s law. The stress y is (elasticity C) times (stretch- 
ing Ax). For heat conduction, Ax is a temperature gradient. For oil flows it is a pressure 
gradient. There is a similar law in Section 8.6 for least squares regression in statistics. 


My textbooks /ntroduction to Applied Mathematics and Computational Science and 
Engineering (Wellesley-Cambridge Press) are practically built on A'CA. This is the key 
to equilibrium in matrix equations and also in differential equations. Applied mathematics 
is more organized than it looks. / have learned to watch for ATCA. 


We now give an example with a current source. Kirchhoff’s Law changes from 
Aly = 0to ATy = f, to balance the source f from outside. Flow into each node 
still equals flow out. Figure 8.5 shows the network with its conductances c;,...,c6, and 
it shows the current source going into node 1. The source comes out at node 4 to keep the 
balance (in = out). The problem is: Find the currents y1,..., yg on the six edges. 


x; 
©) 
yı Y2 


Figure 8.5: The currents in a network with a source S into node 1. 


Example 1 All conductances are c = 1, so that C = J. A current y4 travels directly 
from node 1 to node 4. Other current goes the long way from node 1 to node 2 to node 4 
(this is yy = y5). Current also goes from node 1 to node 3 to node 4 (this is y2 = y6). We 
can find the six currents by using special rules for symmetry, or we can do it right by using 
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ATCA. Since C = I, this matrix is ATA, the graph Laplacian matrix: 


-1 1 0 0 

1-1 0-1 0 O/;;/-1 0 1 
I 0-1 0-1 0 0-1 1 
0 1 1 0 0-1; ]/-1 0 0 
0 0 0 1 1 =1 0 -i 0 
0 0 -i 


mæ = =e O oO 


That last matrix is not invertible! We cannot solve for all four potentials because (1, 1, 1, 1) 
is in the nullspace. One node has to be grounded. Setting x4 = 0 removes the fourth row 
and column, and this leaves a 3 by 3 invertible matrix. Now we solve A'CAx = f for the 
unknown potentials x1, X2, X3, with source S into node 1: 


3 —1 —-l x1 S x4 S/2 
Voltages -l1 3 -l x2 |= | 0 gives x2 | = | 5/4 
—] -l 3 X3 0 X3 5/4 


Ohm’s Law y = —CAx yields the six currents. Remember C = J and x4 = 0: 


yı —1 l 0 0 5/4 
y2 -1 0 1 Olfss2 S/4 
ys}__| 0-1 1 Off s/4|_] 0 
Currents yal |-1 0 0 il s/4}=|s/2 
Ys 0-1 0 il] o S/4 
y6 0 0-1 1 S/4 


Half the current goes directly on edge 4. That is y4 = S/2. No current crosses from node 
2 to node 3. Symmetry indicated y3 = O and now the solution proves it. 


The same matrix ATA appears in least squares. Nature distributes the currents to minimize 
the heat loss. Statistics chooses ¥ to minimize the least squares error. 


Problem Set 8.2 


Problems 1-7 and 8-14 are about the incidence matrices for these graphs. 


1 1 2 
edge 1 | | edge 2 eo 
2 3 3 4 


edge 3 
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11 


12 


Write down the 3 by 3 incidence matrix A for the triangle graph. The first row has 
—1 in column 1 and +1 in column 2. What vectors (x1, x2, x3) are in its nullspace? 
How do you know that (1, 0, 0) is not in its row space? 


Write down AT for the triangle graph. Find a vector y in its nullspace. The compo- 
nents of y are currents on the edges—how much current is going around the triangle? 


Eliminate x; and x2 from the third equation to find the echelon matrix U. What tree 
corresponds to the two nonzero rows of U? 


=x + x2 = bı 
=x; + x3 = b2 
—xX2 + X3 = b3. 


Choose a vector (b1, b2, b3) for which Ax = b can be solved, and another vector b 
that allows no solution. How are those b’s related to y = (1,—1, 1)? 


Choose a vector (fi, fo, f3) for which ATy = f can be solved, and a vector f 
that allows no solution. How are those f’s related to x = (1,1,1)? The equation 
ATy = f is Kirchhoff’s law. 


Multiply matrices to find ATA. Choose a vector f for which ATAx = f can be 
solved, and solve for x. Put those potentials x and the currents y = —Ax and 
current sources f onto the triangle graph. Conductances are 1 because C = T. 


With conductances c; = 1 and cy = c3 = 2, multiply matrices to find ATCA. For 
f = (1,0, —1) find a solution to A'CAx = f. Write the potentials x and currents 
y = —CAx on the triangle graph, when the current source f goes into node 1 and 
out from node 3. 


Write down the 5 by 4 incidence matrix A for the square graph with two loops. Find 
one solution to Ax = 0 and two solutions to ATy = 0. 


Find two requirements on the b’s for the five differences x2 — x1, X3 — X1, X3 — X2, 
X4 — X2, X4 — X3 to equal b1, b2, b3, b4, bs. You have found Kirchhoff’s law 
around the two in the graph. 


Reduce A to its echelon form U. The three nonzero rows give the incidence matrix 
for what graph? You found one tree in the square graph—find the other seven trees. 


Multiply matrices to find ATA and guess how its entries come from the graph: 


(a) The diagonal of ATA tells how many into each node. 


(b) The off-diagonals —1 or 0 tell which pairs of nodes are . 
Why is each statement true about ATA? Answer for ATA not A. 


(a) Its nullspace contains (1, 1, 1, 1). Its rank is n — 1. 
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(b) It is positive semidefinite but not positive definite. 


(c) Its four eigenvalues are real and their signs are 


With conductances c} = co = 2 and c3 = c4 = cs = 3, multiply the matrices 
ATCA. Find a solution to A'CAx = f = (1,0,0,—1). Write these potentials x 
and currents y = —CAx on the nodes and edges of the square graph. 


The matrix ATCA is not invertible. What vectors x are in its nullspace? Why does 
A'CAx = f havea solution if and only if fi + fot+ f+ fa = 0? 


A connected graph with 7 nodes and 7 edges has how many loops? 


For the graph with 4 nodes, 6 edges, and 3 loops, add a new node. If you connect it 
to one old node, Euler’s formula becomes ( )—( )+( ) = 1. If you connect it 
to two old nodes, Euler’s formula becomes ( )—( )+( )=1. 


Suppose A is a 12 by 9 incidence matrix from a connected (but unknown) graph. 


(a) How many columns of A are independent? 

(b) What condition on f makes it possible to solve Aly = f? 

(c) The diagonal entries of ATA give the number of edges into each node. What is 
the sum of those diagonal entries? 


Why does a complete graph with n = 6 nodes have m = 15 edges? A tree connect- 
ing 6 nodes has edges. 


Note The stoichiometric matrix in chemistry is an important “generalized” incidence 
matrix. Its entries show how much of each chemical species (each column) goes into each 
reaction (each row). 
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8.3 Markov Matrices, Population, and Economics 


This section is about positive matrices: every a;; > 0. The key fact is quick to state: 
The largest eigenvalue is real and positive and so is its eigenvector. In economics 
and ecology and population dynamics and random walks, that fact leads a long way: 


Markov Amax=1 Population Amax >1 Consumption Amax < 1 


Amax controls the powers of A. We will see this first for Amax = 1. 


Markov Matrices 
Suppose we multiply a positive vector uo = (a, 1 — a) again and again by this A: 


Markov 8 3 
. A=|" =A = = A? 
matrix É 3 a *o uz = Au, = Auo 
After k steps we have A*uo. The vectors u1, u2, U3,... will approach a “steady state” 
lo = (.6,.4). This final outcome does not depend on the starting vector: For every uo we 
converge to the same tua. The question is why. 
The steady state equation Also = Ugg makes Hoo an eigenvector with eigenvalue 1: 


8 311.6 6 
Steady state E 5 [4] = Hi 


Multiplying by A does not change too. But this does not explain why all vectors uo lead 
tO Zoo. Other examples might have a steady state, but it is not necessarily attractive: 


1 0 


Not Markov B= | 02 


| has the unattractive steady state B fo] = A . 


In this case, the starting vector uo = (0,1) will give u; = (0,2) and u2 = (0,4). The 
second components are doubled. In the language of eigenvalues, B has A = 1 but also 
= 2— this produces instability. The component of u along that unstable eigenvector is 
multiplied by À, and |A| > 1 means blowup. 
This section is about two special properties of A that guarantee a stable steady state. 
These properties define a Markov matrix, and A above is one particular example: 


z a L Every entry of A is nonnegative. | : 7 oe = ; oe s a 
© 2. Every column of A adds to 1. oe on Pte 


B did not have Property 2. When A is a Markov matrix, two facts are immediate: 
1. Multiplying a nonnegative uo by A produces a nonnegative u; = Aug. 


2. If the components of ug add to 1, so do the components of u; = Aug. 
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Reason: The components of to add to 1 when [1 e 1 Juo = 1. This is true for each 
column of A by Property 2. Then by matrix multiplication [1 ... 1JA=[1 ... 1]: 


Components of Aug add to 1 [1 =. IJAuo = [1 ++: Ijuo= 1. 


The same facts apply to uz = Au; and u3 = Auz. Every vector Ak uo is nonnegative 
with components adding to 1. These are “probability vectors.” The limit ugg is also a 
probability vector—but we have to prove that there is a limit. We will show that Amax = 1 
for a positive Markov matrix. 


Example 1 The fraction of rental cars in Denver starts at 4 = 02. The fraction outside 
Denver is .98. Every month, 80% of the Denver cars stay in Denver (and 20% leave). 
Also 5% of the outside cars come in (95% stay outside). This means that the fractions 
Ug = (.02, .98) are multiplied by A: 


80.05 = [02] _ 7.065 
First month A= E A leadsto uy = Auo =A [sa] ~ pai 


Notice that .065 + .935 = 1. All cars are accounted for. Each step multiplies by A: 
Next month üz = Au; = (.09875, .90125). This is A7uo. 


All these vectors are positive because A is positive. Each vector ug will have its compo- 
nents adding to 1. The first component has grown from .02 and cars are moving toward 
Denver. What happens in the long run? 


This section involves powers of matrices. The understanding of A* was our first and 
best application of diagonalization. Where A* can be complicated, the diagonal matrix A* 
is simple. The eigenvector matrix S connects them: A* equals SA‘ S—!. The new applica- 
tion to Markov matrices uses the eigenvalues (in A) and the eigenvectors (in S). We will 
show that uoo is an eigenvector corresponding to A = 1. 

Since every column of A adds to 1, nothing is lost or gained. We are moving rental cars 
or populations, and no cars or people suddenly appear (or disappear). The fractions add to 
1 and the matrix A keeps them that way. The question is how they are distributed after k 
time periods—which leads us to A*. 


Solution A*uo gives the fractions in and out of Denver after k steps. We diagonalize A to 
understand A*. The eigenvalues are A = 1 and .75 (the trace is 1.75). 


2 2 —1 —1 
Ax = Àx a[l =f] and al 1 |= 75| if: 
The starting vector ug combines xı and x9, in this case with coefficients 1 and .18: 


Combination of eigenvectors üg = [ss] = [3] +.18 mil , 


Now multiply by A to find u1. The eigenvectors are multiplied by A; = 1 and Az = .75: 


Each x is multiplied by À uy, = 1 [3] + (.75)(.18) Fil . 
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Every month, another .75 multiplies the vector x2. The eigenvector x ; is unchanged: 


After k steps uy = A*u = [3] + (.75)* (.18) ri . 


This equation reveals what happens. The eigenvector xı with A = 1 is the steady state. 
The other eigenvector x2 disappears because |A| < 1. The more steps we take, the closer 
we come to Hoo = (.2,.8). In the limit, 4 7 of the cars are in Denver and & jo are outside. 
This is the pattern for Markov chains, even starting from ug = (0, 1): 


kov matrix (entries ayi > 0; each: column- adds to:-1);. then 
an: an! other r eigenvalue. The: eigeiivector žy is the: > steady s state: © 


| at = ¥1 + peasy +- e+ enn always approaches Ugo = X1. 


The first point is to see that A = 1 is an eigenvalue of A. Reason: Every column of 
A-— T adds to 1—1 = 0. The rows of A — I add up to the zero row. Those rows are linearly 
dependent, so A — I is singular. Its determinant is zero and A = 1 is an eigenvalue. 

The second point is that no eigenvalue can have |À] > 1. With such an eigenvalue, 
the powers A* would grow. But AÝ is also a Markov matrix! A* has nonnegative entries 
still adding to 1—and that leaves no room to get large. 


A lot of attention is paid to the possibility that another eigenvalue has |A| = 1. 
Example2 A=[? i] has no steady state because A2 = —1. 


This matrix sends all cars from inside Denver to outside, and vice versa. 
The powers A* alternate between A and J. The second eigenvector x2 = (—1, 1) will be 
multiplied by Az = —1 at every step—and does not become smaller: No steady state. 


Suppose the entries of A or any power of A are all positive—zero is not allowed. 
In this “regular” or “primitive” case, A = 1 is strictly larger than any other eigenvalue. 
The powers A* approach the rank one matrix that has the steady state in every column. 


Example 3 (“Everybody moves”) Start with three groups. At each time step, half of 
group 1 goes to group 2 and the other half goes to group 3. The other groups also split in 
half and move. Take one step from the starting populations p1, p2, p3: 


1 1 1 1 

92 alfa 3P2+ 3P3 

New populations u, = Au =|3 0 3] | p2| =| $p1+423 
5 5 0 P3 jP + ip 


A is a Markov matrix. Nobody is born or lost. A contains zeros, which gave trouble in 
Example 2. But after two steps in this new example, the zeros disappear from A?: 


1 1ł1 1 

3 4 4 pi 
Two-step matrix uy = A*ug = 4 4 l p 

1 i 1 3 

4 4 2 P 
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The eigenvalues of A are A; = 1 (because A is Markov) and Az = A3 = —i. For A = 1, 
the eigenvector x; = (4, Ł, }) will be the steady state. When three equal populations 
split in half and move, the populations are again equal. Starting from up = (8, 16,32), 
the Markov chain approaches its steady state: 


8 24 16 20 
lig = 16 ui = 20 t = 18 “= 19 
32 12 22 17 


The step to u4 will split some people in half. This cannot be helped. The total population 
is 8 + 16 + 32 = 56 at every step. The steady state is 56 times (4, 4, 4). You can see the 
three populations approaching, but never reaching, their final limits 56/3. 


Challenge Problem 6.7.16 created a Markov matrix A from the number of links be- 
tween websites. The steady state u will give the Google rankings. Google finds uoo by a 
random walk that follows links (random surfing). That eigenvector comes from counting 
the fraction of visits to each website—a quick way to compute the steady state. 


The size |42] of the next largest eigenvalue controls the speed of convergence to steady 
State. 


Perron-Frobenius Theorem 


One matrix theorem dominates this subject. The Perron-Frobenius Theorem applies when 
all a;; > 0. There is no requirement that columns add to 1. We prove the neatest form, 
when all a;; > 0. 


Ax = Amaxx are strictly positive. 


Proof The key idea is to look at all numbers ¢ such that Ax > tx for some nonnegative 
vector x (other than x = 0). We are allowing inequality in Ax > tx in order to have 
many positive candidates ż. For the largest value tmax (which is attained), we will show 
that equality holds: Ax = tmaxXx. 


Otherwise, if Ax > maxx is not an equality, multiply by A. Because A is positive 
that produces a strict inequality A?x > tmax Ax. Therefore the positive vector y = Ax 
satisfies Ay > tmaxy, and tmax could be increased. This contradiction forces the equality 
Ax = tmaxx, and we have an eigenvalue. Its eigenvector x is positive because on the left 
side of that equality, Ax is sure to be positive. 

To see that no eigenvalue can be larger than fmax, suppose Az = Az. Since A and z 
may involve negative or complex numbers, we take absolute values: |A||z| = |Az| < Alz] 
by the “triangle inequality.” This |z| is a nonnegative vector, so |A| is one of the possible 
candidates £. Therefore |A| cannot exceed tmax—which must be Amax. 
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Population Growth 


Divide the population into three age groups: age < 20, age 20 to 39, and age 40 to 59. 
At year T the sizes of those groups are nj, n2,7n3. Twenty years later, the sizes have 
changed for two reasons: 


1. Reproduction n®?eW = Fin, + Fonz + F313 gives a new generation 
2. Survival nB€W = Pin; and nBeW = Pon, gives the older generations 


The fertility rates are Fi, F2, F3 (F2 largest). The Leslie matrix A might look like this: 


new 


Ai Fy Fy F3 ny 04 11 Ol ni 
no =} PA 0 0 no {=] .98 0 0 n2 
n3 0 P2 0 N3 0 .92 0 n3 


This is population projection in its simplest form, the same matrix A at every step. In 
a realistic model, A will change with time (from the environment or internal factors). 
Professors may want to include a fourth group, age > 60, but we don’t allow it. 


The matrix has A > 0 but not A > 0. The Perron-Frobenius theorem still applies 
because A? > 0. The largest eigenvalue is Amax œ% 1.06. You can watch the generations 
move, starting from n> = | in the middle generation: 


1.06 1.08 0.05 .00 0.10 1.19 .01 
eig(A) = —1.01 A?=| 0.04 1.08 01 A? = | 0.06 0.05 .00 
—0.01 0.909 0 0 0.04 0.99 .01 


A fast start would come from #9 = (0,1,0). That middle group will reproduce 1.1 and 
also survive .92. The newest and oldest generations are in u, = (1.1,0,.92) = column 2 
of A. Then uz = Au, = A7up is the second column of A?. The early numbers (transients) 
depend a lot on uo, but the asymptotic growth rate Amax is the same from every start. 
Its eigenvector x = (.63, .58, .51) shows all three groups growing steadily together. 


Caswell’s book on Matrix Population Models emphasizes sensitivity analysis. The 
model is never exactly right. If the F’s or P’s in the matrix change by 10%, does Amax 
go below 1 (which means extinction)? Problem 19 will show that a matrix change AA 
produces an eigenvalue change AA = y'(AA)x. Here x and y” are the right and left 
eigenvectors of A. So x is a column of S and yT isa row of S~!. 


Linear Algebra in Economics: The Consumption Matrix 


A long essay about linear algebra in economics would be out of place here. A short note 
about one matrix seems reasonable. The consumption matrix tells how much of each input 
goes into a unit of output. This describes the manufacturing side of the economy. 


436 Chapter 8. Applications 


Consumption matrix We have n industries like chemicals, food, and oil. To produce a 
unit of chemicals may require .2 units of chemicals, .3 units of food, and .4 units of oil. 
Those numbers go into row 1 of the consumption matrix A: 


chemical output .2 .3 .4 | | chemical input 
food output =|}4 4 1 food input 
oil output 5 1 3 oil input 


Row 2 shows the inputs to produce food—a heavy use of chemicals and food, not so much 
oil. Row 3 of A shows the inputs consumed to refine a unit of oil. The real consumption 
matrix for the United States in 1958 contained 83 industries. The models in the 1990’s 
are much larger and more precise. We chose a consumption matrix that has a convenient 
eigenvector. 

Now comes the question: Can this economy meet demands yj, Y2, y3 for chemicals, 
food, and oil? To do that, the inputs pi, p2, p3 will have to be higher—because part of p 
is consumed in producing y. The input is p and the consumption is Ap, which leaves the 
output p — Ap. This net production is what meets the demand y: 


Problem Find a vector p such that p-Ap=y or p=(l—A) ty, © 


Apparently the linear algebra question is whether J — A is invertible. But there is more 
to the problem. The demand vector y is nonnegative, and so is A. The production levels in 
p = (I — A)! y must also be nonnegative. The real question is: 


When is (I — A)! a nonnegative matrix? 


This is the test on (J — A)~! for a productive economy, which can meet any positive 
demand. If A is small compared to J, then Ap is small compared to p. There is plenty 
of output. If A is too large, then production consumes more than it yields. In this case the 
external demand y cannot be met. 


“Small” or “large” is decided by the largest eigenvalue A, of A (which is positive): 


IfA,; >1 then (I — A)" has negative entries 
IfA,; =1 then (J — A) fails to exist 
IfA; <1 then (J — A)! is nonnegative as desired. 


The main point is that last one. The reasoning uses a nice formula for (J — A)7', which 
we give now. The most important infinite series in mathematics is the geometric series 
1+x+x? +--+. This series adds up to 1/(1 — x) provided x lies between —1 and 1. 
When x = 1 the series is 1 + 1+ 1+ --- = oo. When |x| > 1 the terms x” don’t go to 
zero and the series has no chance to converge. 

The nice formula for (I — A)~! is the geometric series of matrices: 


Geometricseries © (I-A) =I +A+A +A +o 
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If you multiply the series S = J + A + A? +--- by A, you get the same series except 
for 7. Therefore S — AS = I, whichis (J — A)S = I. The series adds to S = (J — A)7! 
if it converges. And it converges if all eigenvalues of A have |A| < 1. 

In our case A > 0. All terms of the series are nonnegative. Its sum is (J — A)7! > 0. 


2 3 4 41 25 27 
Example4 A=|.4 .4 .1 | hasAmax = .9and (I — A)! = 4 | 33 36 24 
5 1 3 34 23 36 


This economy is productive. A is small compared to J, because Amax is .9. To meet the 
demand y, start from p = (J — A) "ly. Then Ap is consumed in production, leaving 
p — Ap. This is (J — A)p = y, and the demand is met. 

Example5 A =| l has Amax = 2 and (I — A)? = Ie iI 


This consumption matrix A is too large. Demands can’t be met, because production con- 
sumes more than it yields. The series J + A + A? +... does not converge to (J — A)7! 
because Amax > 1. The series is growing while (J — A)~! is actually negative. 

In the same way 1 +2 + 4+ +-+- is not really 1/(1 — 2) = —1. But not entirely false ! 


Problem Set 8.3 


Questions 1-12 are about Markov matrices and their eigenvalues and powers. 
1 Find the eigenvalues of this Markov matrix (their sum is the trace): 
90 .15 
A= ke 85 | i 
What is the steady state eigenvector for the eigenvalue A; = 1? 


2 Diagonalize the Markov matrix in Problem 1 to A = SAST! by finding its other 


rarer 


What is the limit of Ak = SAF S73 when AK = L pA ] approaches È 0]? 


3 What are the eigenvalues and steady state eigenvectors for these Markov matrices? 


s=[) 3] [2 J 


4 For every 4 by 4 Markov matrix, what eigenvector of AT corresponds to the (known) 
eigenvalue A = 1? 


bl Bim wl 
Ale Nise Ale 
Nie Ble Ale 
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Every year 2% of young people become old and 3% of old people become dead. 
(No births.) Find the steady state for 


young .98 .00 O|] young 
old = | .02 .97 0 old 
dead .00 .03 1 dead 


k+1 k 


For a Markov matrix, the sum of the components of x equals the sum of the compo- 
nents of Ax. If Ax = Ax with À Æ 1, prove that the components of this non-steady 
eigenvector x add to zero. 


Find the eigenvalues and eigenvectors of A. Explain why A* approaches A™: 
_ | 8 3 co |6 .6 
4=|3 5] ea ah 


Challenge problem: Which Markov matrices produce that steady state (.6, .4)? 


The steady state eigenvector of a permutation matrix is (4,4, 4,4). This is not 
approached when uo = (0,0,0,1). What are u; and u2 and u3 and u4? What are 
the four eigenvalues of P, which solve A* = 1? 


Permutation matrix = Markov matrix P= 


=- OC SO 
ooor 
O Om- © 
O- © © 


Prove that the square of a Markov matrix is also a Markov matrix. 


If A = [2 b] is a Markov matrix, its eigenvalues are 1 and 
eigenvector is x; = 


. The steady state 


Complete A to a Markov matrix and find the steady state eigenvector. When A is a 


symmetric Markov matrix, why is x; = (1,..., 1) its steady state? 
7 A 2 
A=|.1 6 3 


A Markov differential equation is not du/dt = Au but du/dt = (A — Iju. The 
diagonal is negative, the rest of A — J is positive. The columns add to zero. 


Find the eigenvalues of B = A — I = E 3 


] Why does A — 7 have 2 = 0? 


When eĉ! and eĉ?! multiply xı and x2, what is the steady state as t > o0? 
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Questions 13-15 are about linear algebra in economics. 


13 


14 


15 


16 


17 
18 


19 


20 


Each row of the consumption matrix in Example 4 adds to .9. Why does that make 
A = .9 an eigenvalue, and what is the eigenvector? 


Multiply J + A + A? + A? +--- by I — A to show that the series adds to . 
For A = [? h ], find A? and A? and use the pattern to add up the series. 


For which of these matrices does J + A + A? +--+ yield a nonnegative matrix 
(I — A)~!? Then the economy can meet any demand: 


BJ RI fe 


If the demands are y = (2, 6), what are the vectors p = (I — A)! y? 


(Markov again) This matrix has zero determinant. What are its eigenvalues? 
4 2 3 
A=|.2 4 3 
4 4 4 


Find the limits of A* uo starting from ug = (1,0, 0) and then uo = (100, 0, 0). 
If A is a Markov matrix, does J + A + A? +--- add up to (J — A)7!? 


For the Leslie matrix show that det(A — ÀI) = 0 gives FjA? + Fo Pià + F3 Pi Pa = 
A3. The right side A? is larger as A —> oo. The left side is larger at A = 1 if 
F, + Fo Pı + F3P,P2 > 1. In that case the two sides are equal at an eigenvalue 
Amax > 1: growth. 


Sensitivity of eigenvalues: A matrix change AA produces eigenvalue changes AA. 
The formula for those changes AdAy,...,AAn is diag(S~1 AAS), Challenge: 


Start from AS = SA. The eigenvectors and eigenvalues change by AS and AA: 
(A+A A)(S+AS) = (S+AS)(A+AA) becomes A(ASH{AA)S = SC(AAH{AS)A. 


Small terms (A A)(AS) and (AS)(AA) are ignored. Multiply the last equation by 
S~!. From the inner terms, the diagonal part of S~'(A.A)S gives AA as we want. 
Why do the outer terms ST! A AS and ST! AS A cancel on the diagonal? 


Explain S~'A = AS7! andthen diag(A ST! AS) = diag(S7! AS A). 


Suppose B > A > 0, meaning that each bj; > ai; > 0. How does the Perron- 
Frobenius discussion show that Amax( B) > Amax(A) ? 
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8.4 Linear Programming 


Linear programming is linear algebra plus two new ideas: inequalities and minimization. 
The starting point is still a matrix equation Ax = b. But the only acceptable solutions 
are nonnegative. We require x > 0 (meaning that no component of x can be negative). 
The matrix has n > m, more unknowns than equations. If there are any solutions x > 0 
to Ax = b, there are probably a lot. Linear programming picks the solution x* > 0 
that minimizes the cost: 
The cost is c4x1 + +++ + CnXn. The winning vector x* is 
-the nonnegative solution of Ax = b that has smallest cost. 


Thus a linear programming problem starts with a matrix A and two vectors b and c: 
i) A hasn > m: for example A=[1 1 2] (one equation, three unknowns) 
ii) b has m components for m equations Ax = b: for example b = [4] 
iii) The cost vector c has n components: for example e =[5 3 8]. 


Then the problem is to minimize ¢ - x subject to the requirements Ax = b and x > 0: 
Minimize 5x, +3x2+8x3 subjectto xı + x2 +2x3 =4 and x1,x2,x3 > Q. 


We jumped right into the problem, without explaining where it comes from. Linear pro- 
gramming is actually the most important application of mathematics to management. De- 
velopment of the fastest algorithm and fastest code is highly competitive. You will see that 
finding x* is harder than solving Ax = b, because of the extra requirements: x* > 0 and 
minimum cost c'x*. We will explain the background, and the famous simplex method, and 
interior point methods, after solving the example. 

Look first at the “constraints”: Ax = b and x > 0. The equation x; + x2 + 2x3 = 4 
gives a plane in three dimensions. The nonnegativity x; > 0,x2 > 0,x3 > 0 chops the 
plane down to a triangle. The solution x* must lie in the triangle POR in Figure 8.6. 

Inside that triangle, ‘all components of x are positive. On the edges of POR, 
one component is zero. At the corners P and Q and R, two components are zero. The 
optimal solution x* will be one of those corners! We will now show why. 

The triangle contains all vectors x that satisfy Ax = b and x > 0. Those x’s are called 
feasible points, and the triangle is the feasible set. These points are the allowed candidates 
in the minimization of c « x, which is the final step: 


Find x* in the triangle PQR to minimize the cost 5x1 + 3x2 + 8x3. 


The vectors that have zero cost lie on the plane 5x, + 3x2 + 8x3 = 0. That plane does 
not meet the triangle. We cannot achieve zero cost, while meeting the requirements on x. 
So increase the cost C until the plane 5x; + 3x2 + 8x3 = C does meet the triangle. 
As C increases, we have parallel planes moving toward the triangle. 
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R = (0,0, 2) 
(2 hours by computer) 


Example with four homework problems 
Ax = b is the plane x; + x2 + 2x3 = 4 
Triangle has xı > 0, x2 > 0, x3 > 0 


corners have 2 zero components 
cost c'x = 5x1 + 3x2 + 8x3 


P = (4,0, 0) (4 hours by Ph.D.) 


Figure 8.6: The triangle contains all nonnegative solutions: Ax = b and x > 0. The 
lowest cost solution x* is a corner P, Q, or R of this feasible set. 


The first plane 5x; + 3x2 + 8x3 = C to touch the triangle has minimum cost C. 
The point where it touches is the solution x*. This touching point must be one of the 
corners P or Q or R. A moving plane could not reach the inside of the triangle before it 
touches a comer! So check the cost 5x; + 3x2 + 8x3 at each corner: 


P = (4,0,0) costs20 Q = (0,4,0) costs 12 R = (0,0,2) costs 16.: 


The winner is Q. Then x* = (0, 4, 0) solves the linear programming problem. 

If the cost vector c is changed, the parallel planes are tilted. For small changes, Q 
is still the winner. For the cost e+ x = 5x, + 4x2 + 7x3, the optimum x* moves to 
R = (0,0, 2). The minimum cost is now 7-2 = 14. 


Note 1 Some linear programs maximize profit instead of minimizing cost. The mathemat- 
ics is almost the same. The parallel planes start with a large value of C, instead of a small 
value. They move toward the origin (instead of away), as C gets smaller. The first touching 
point is still a corner. 


Note 2 The requirements Ax = b and x > 0 could be impossible to satisfy. The equation 
xı + x2 + x3 = —1 cannot be solved with x > 0. That feasible set is empty. 


Note 3 It could also happen that the feasible set is unbounded. If the requirement is 
xı + x2 — 2x3 = 4, the large positive vector (100, 100, 98) is now a candidate. So is 
the larger vector (1000, 1000, 998). The plane Ax = b is no longer chopped off to a 
triangle. The two corners P and Q are still candidates for x*, but R moved to infinity. 


Note 4 With an unbounded feasible set, the minimum cost could be —oo (minus infinity). 
Suppose the cost is —xy — x2 + x3. Then the vector (100, 100, 98) costs C = ~102. 
The vector (1000, 1000, 998) costs C = —1002. We are being paid to include x, and x2, 
instead of paying a cost. In realistic applications this will not happen. But it is theoretically 
possible that A, b, and c can produce unexpected triangles and costs. 


Q = (0,4, 0) (4 hours by student) 
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The Primal and Dual Problems 


This first problem will fit A, b, c in that example. The unknowns x1, x2, x3 represent hours 
of work by a Ph.D. and a student and a machine. The costs per hour are $5, $3, and $8. 
(I apologize for such low pay.) The number of hours cannot be negative: x; > 0,x2 > 
0,x3 = 0. The Ph.D. and the student get through one homework problem per hour. The 
machine solves two problems in one hour. In principle they can share out the homework, 
which has four problems to be solved: x; + x2 + 2x3 = 4. 


The problem is to finish the four problems at minimum cost c'x. 


If all three are working, the job takes one hour: x; = x2 = x3 = 1. The cost is 
5+3+8 = 16. But certainly the Ph.D. should be put out of work by the student (who 
is just as fast and costs less—this problem is getting realistic). When the student works 
two hours and the machine works one, the cost is 6 + 8 and all four problems get solved. 
We are on the edge QR of the triangle because the Ph.D. is not working: xı = 0. 
But the best point is all work by student (at Q) or all work by machine (at R). In 
this example the student solves four problems in four hours for $12—the minimum cost. 


With only one equation in Ax = b, the corner (0,4,0) has only one nonzero 
component. When Ax = b has m equations, corners have m nonzeros. We solve 
Ax = b for those m variables, with n — m free variables set to zero. But unlike Chap- 
ter 3, we don’t know which m variables to choose. 

The number of possible corners is the number of ways to choose m components out 
of n. This number “n choose m” is heavily involved in gambling and probability. With 
n = 20 unknowns and m = 8 equations (still small numbers), the “feasible set” can have 
20!/8!12! corners. That number is (20)(19)--- (13) = 5,079,1 10,400. 

Checking three corners for the minimum cost was fine. Checking five billion corners is 
not the way to go. The simplex method described below is much faster. 


The Dual Problem In linear programming, problems come in pairs. There is a minimum 
problem and a maximum problem—the original and its “dual.” The original problem was 
specified by a matrix A and two vectors b and c. The dual problem transposes A and 
switches b and c: Maximize b - y. Here is the dual to our example: 


A cheater offers to solve homework problems by selling the answers. 
The charge is y dollars per problem, or 4y altogether. (Note how b = 4 
has gone into the cost.) The cheater must be as cheap as the Ph.D. or student 
or machine: y < 5 and y < 3 and 2y < 8. (Note how ¢ = (5,3, 8) has gone 
into inequality constraints). The cheater maximizes the income 4y. 


Maximize b+ y subject to Ay <c, 


The maximum occurs when y = 3. The income is 4y = 12. The maximum in the dual 
problem ($12) equals the minimum in the original ($12). Max = min is duality. 
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This book started with a row picture and a column picture. The first “duality theorem” was 
about rank: The number of independent rows equals the number of independent columns. 
That theorem, like this one, was easy for small matrices. Minimum cost = maximum 
income is proved in our text Linear Algebra and Its Applications. One line will establish 
the easy half of the theorem: The cheater’s income b" y cannot exceed the honest cost: 


If Ax =b,x 20,A’y <c then b'y =(Ax)'y =xT(ATy)<xTe. (1) 


The full duality theorem says that when bT y reaches its maximum and xTc reaches its 
minimum, they are equal: b - y* = c- x*. Look at the last step in (1), with < sign: 


The dot product of x = >Oands =e -Ay 2 > 0 gave x's > 0. This is swtAly < < xe. 


Equality needs x Ts : = 0 x* = 0or st = -0 foreach j. 


The Simplex Method 


Elimination is the workhorse for linear equations. The simplex method is the workhorse for 
linear inequalities. We cannot give the simplex method as much space as elimination, but 
the idea can be clear. The simplex method goes from one corner to a neighboring corner of 
lower cost. Eventually (and quite soon in practice) it reaches the corner of minimum cost. 

A corner is a vector x > O that satisfies the m equations Ax = b with at most m 
positive components. The other n — m components are zero. (Those are the free variables. 
Back substitution gives the m basic variables. All variables must be nonnegative or x is 
a false corner.) For a neighboring corner, one zero component of x becomes positive and 
one positive component becomes zero. 


The simplex method must decide which component “enters” by becoming positive, 
and which component “leaves” by becoming zero. That exchange is chosen so as to 
lower the total cost. This is one step of the simplex method, moving toward x*. 


Here is the overall plan. Look at each zero component at the current corner. If it 
changes from 0 to 1, the other nonzeros have to adjust to keep Ax = b. Find the new 
x by back substitution and compute the change in the total cost e » x. This change is the 
“reduced cost” r of the new component. The entering variable is the one that gives the 
most negative r. This is the greatest cost reduction for a single unit of a new variable. 


Example 1 Suppose the current corner is P = (4,0,0), with the Ph.D. doing all the 
work (the cost is $20). If the student works one hour, the cost of x = (3, 1,0) is down to 
$18. The reduced cost is r = —2. If the machine works one hour, then x = (2,0, 1) also 
costs $18. The reduced cost is also r = —2. In this case the simplex method can choose 
either the student or the machine as the entering variable. 
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Even in this small example, the first step may not go immediately to the best x*. 
The method chooses the entering variable before it knows how much of that variable 
to include. We computed r when the entering variable changes from 0 to 1, but one unit 
may be too much or too little. The method now chooses the leaving variable (the Ph.D.). 
It moves to comer Q or R in the figure. 

The more of the entering variable we include, the lower the cost. This has to stop 
when one of the positive components (which are adjusting to keep Ax = b) hits zero. The 
leaving variable is the first positive x; to reach zero. When that happens, a neighboring 
corner has been found. Then start again (from the new corner) to find the next variables to 
enter and leave. 

When all reduced costs are positive, the current corner is the optimal x*. 
No zero component can become positive without increasing c - x. No new variable should 
enter. The problem is solved (and we can show that y* is found too). 


Note Generally x* is reached in an steps, where œ is not large. But examples have been 
invented which use an exponential number of simplex steps. Eventually a different ap- 
proach was developed, which is guaranteed to reach x* in fewer (but more difficult) steps. 
The new methods travel through the interior of the feasible set. 


Example 2 Minimize the cost e -x = 3x, + x2 + 9x3 + x4. The constraints are x > 0 
and two equations Ax = b: 


xı + 2x3 + x4 = 4 m=2 equations 
X2 +x3-X4 = 2 n= 4 unknowns. 


A starting corner is x = (4,2,0,0) which costs c+ x = 14. It has m = 2 nonzeros and 
n —m = 2 zeros. The zeros are x3 and x4. The question is whether x3 or x4 should enter 
(become nonzero). Try one unit of each of them: 


Ifx3 = land x4 =0, thenx = (2,1, 1,0) costs 16. 
Hix = land x3. =.0, then x = G,3,0, 1) costs 13.° 


Compare those costs with 14. The reduced cost of x3 is r = 2, positive and useless. The 
reduced cost of x4 is r = —1, negative and helpful. The entering variable is x4. 

How much of x4 can enter? One unit of x4 made x; drop from 4 to 3. Four units will 
make x, drop from 4 to zero (while x2 increases all the way to 6). The leaving variable is 
xı. The new corner is x = (0, 6,0, 4), which costs only e - x = 10. This is the optimal 
x*, but to know that we have to try another simplex step from (0, 6,0, 4). Suppose x, or 
x3 tries to enter: 


Start from the If x; = l and x3 = 0, then x = (1,5,0, 3) costs 11. 
corner (0, 6, 0, 4) If x3 = l and xı =0, thenx = (0,3, 1,2) costs 14. 


Those costs are higher than 10. Both r’s are positive—it does not pay to move. The current 
corner (0, 6, 0, 4) is the solution x*. 
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These calculations can be streamlined. Each simplex step solves three linear systems 
with the same matrix B. (This is the m by m matrix that keeps the m basic columns of A.) 
When a column enters and an old column leaves, there is a quick way to update B~!. That 
is how most codes organize the simplex method. 

Our text on Computational Science and Engineering includes a short code with com- 
ments. (The code is also on math.mit.edu/cse) The best y* solves m equations AT y* = ¢ 
in the m components that are nonzero in x*. Then we have optimality x's = 0 and this is 
duality: Either x* = 0 or the “slack” in s* = c — A‘ y* has s% = 0. 

When x* = (0, 4,0) was the optimal comer Q, the cheater’s price was set by y* = 3. 


Interior Point Methods 


The simplex method moves along the edges of the feasible set, eventually reaching the 
optimal corner x*. Interior point methods move inside the feasible set (where x > 0). 
These methods hope to go more directly to x*. They work well. 

One way to stay inside is to put a barrier at the boundary. Add extra cost as a 
logarithm that blows up when any variable x; touches zero. The best vector has x > 0. 
The number @ is a small parameter that we move toward zero. . 


Barrier problem | “Minimize c™x — 6 (log x; + s + log xn) with Ax = b o E (2) 


This cost is nonlinear (but linear programming is already nonlinear from inequalities). 
The constraints x; > 0 are not needed because log x; becomes infinite at x; = 0. 

The barrier gives an approximate problem for each 0. The m constraints Ax = b have 
Lagrange multipliers y1,..., Ym- This is the good way to deal with constraints. 


y from Lagrange L(x, y,6) = elx — 6 (© log x;) — yT(Ax — b) (3) 


dL/dy = 0 brings back Ax = b. The derivatives ƏL/ðx; are interesting ! 


o: barierpbm i ax; “! x; =(4 y) = 0 ; ‘whieh S a a °. 7 h 


The true problem has x;s; = 0. The barrier problem has x;s; = 8. The solutions x*(@) 
lie on the central path to x* (0). Those n optimality equations x ;s; = @ are nonlinear, and 
we solve them iteratively by Newton’s method. 

The current x, y,s will satisfy Ax = b,x > Oand Aly + s = c, but not xjsj = 0. 
Newton’s method takes a step Ax, Ay, As. By ignoring the second-order term Ax As 
in (x + Ax)(s + As) = 0, the corrections in x, y, s come from linear equations: 


AAx=0 
Newton step ATAy +As=0 (5) 
SjAx;+xj;As;j=O—-—x;5S; 
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Newton iteration has quadratic convergence for each 9, and then @ approaches zero. 
The duality gap x"s generally goes below 1078 after 20 to 60 steps. The explanation 
in my Computational Science and Engineering textbook takes one Newton step in detail, 
for the example with four homework problems. I didn’t intend that the student should end 
up doing all the work, but x* turned out that way. 

This interior point method is used almost “as is” in commercial software, for a large 
class of linear and nonlinear optimization problems. 


Problem Set 8.4 


1 Draw the region in the xy plane where x + 2y = 6 and x > O and y > 0. Which 
point in this “feasible set” minimizes the cost c = x + 3y? Which point gives 
maximum cost? Those points are at corners. 


2 Draw the region in the xy plane where x + 2y <6, 2x +y <6,x>0, y>0.It 
has four corners. Which corner minimizes the cost c = 2x — y? 


3 What are the corners of the set xj + 2x2 — x3 = 4 with x1, x2, x3 all > 0? Show 
that the cost xı + 2x3 can be very negative in this feasible set. This is an example of 
unbounded cost: no minimum. 


4 Start at x = (0,0,2) where the machine solves all four problems for $16. Move 
tox = (0,1, ) to find the reduced cost r (the savings per hour) for work by the 
student. Find r for the Ph.D. by moving tox = (1,0, ) with 1 hour of Ph.D. work. 


5 Start Example 1 from the Ph.D. corner (4,0, 0) with c changed to[5 3 7]. Show 
that z is better for the machine even when the total cost is lower for the student. The 
simplex method takes two steps, first to the machine and then to the student for x*. 


6 Choose a different cost vector c so the Ph.D. gets the job. Rewrite the dual problem 
(maximum income to the cheater). 


7 A six-problem homework on which the Ph.D. is fastest gives a second constraint 
2x1 + x2 + X3 = 6. Then x = (2,2,0) shows two hours of work by Ph.D. and 
student on each homework. Does this x minimize the cost ¢'x with c = (5,3, 8)? 


8 These two problems are also dual. Prove weak duality, that always y'b < c'x: 


Primal problem Minimize c™x with Ax > b and x > 0. 
Dual problem Maximize y'b with ATy < ¢ and y > 0. 
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8.5 Fourier Series: Linear Algebra for Functions 


This section goes from finite dimensions to infinite dimensions. I want to explain linear 
algebra in infinite-dimensional space, and to show that it still works. First step: look back. 
This book began with vectors and dot products and linear combinations. We begin by 
converting those basic ideas to the infinite case—then the rest will follow. 

What does it mean for a vector to have infinitely many components? There are two 
different answers, both good: 


1. The vector becomes v = (01, U2, U3,...). It could be (1, Ł, L, a). 
2. The vector becomes a function f(x). It could be sin x. 


We will go both ways. Then the idea of Fourier series will connect them. 
After vectors come dot products. The natural dot product of two infinite vectors 
(v1, U2,...) and (w1, W2,...) is an infinite series: 


Dot product viw y {Wy t vziat = a (1) 


This brings a new question, which never occurred to us for vectors in R”. Does this infinite 
sum add up to a finite number? Does the series converge? Here is the first and biggest 
difference between finite and infinite. 

When v = w = (1,1,1,. ..), the sum certainly does not converge. In that case 
v-w=1+1+14--- is infinite. Since v equals w, we are really computing v «v = 
lvl? = length squared. The vector (1, 1,1,.. .) has infinite length. We don’t want that 
vector. Since we are making the rules, we don’t have to include it. The only vectors to be 
allowed are those with finite length: 


DEFINITION The vector (v1, v2,. . .) is in our infinite-dimensional “Hilbert space” if and 
only if its length ||v]| is finite: 


vl|? =v -v = v? + vs + v2 +--- must add to a finite number. 
i 2 3 


Example1 The vector v = (1, 5, 4, . . .) is included in Hilbert space, because its length 
is 2/./3. We have a geometric series that adds to 4/3. The length of v is the square root: 
1 
1 
l-3 


— 1 1 — 4 
Length squared v-v=l+3 tøtt Z 
Question If v and w have finite length, how large can their dot product be? 


Answer Thesumv-w = v,w, + v2W2+--- also adds to a finite number. We can safely 
take dot products. The Schwarz inequality is still true: 


Schwarz inequality fu. w < Hol lwil. (2) 


The ratio of v + w to ||v|| |w] is still the cosine of 8 (the angle between v and w). Even in 
infinite-dimensional space, |cos @| is not greater than 1. 


448 Chapter 8. Applications 


Now change over to functions. Those are the “vectors.” The space of functions f(x), 
g(x), A(x), ... defined for O < x < 2x must be somehow bigger than R”. What is the dot 
product of f (x) and g(x)? What is the length of f (x)? 

Key point in the continuous case: Sums are replaced by integrals. Instead of a sum 
of v; times wj, the dot product is an integral of f(x) times g(x). Change the “dot” to 
parentheses with a comma, and change the words “dot product” to inner product: 


DEFINITION The‘inner product of f(x) and g(x), and the length squared, are 


o= Fod and P= fo (Pax) 


The interval [0,27] where the functions are defined could change to a different interval 
like [0, 1] or (—co, 00). We chose 2x because our first examples are sin x and cos x. 


Example 2 The length of f(x) = sin x comes from its inner product with itself: 


2x 


FP = J (sin x)? dx =x. The length of sin x is 7. 
0 


That is a standard integral in calculus—not part of linear algebra. By writing sin? x as 
+ — Í cos 2x, we see it go above and below its average value 4. Multiply that average by 
the interval length 27 to get the answer m. 


More important: sin x and cos x are orthogonal in function space: 


20 27 
Inner product sin x cos x dx = 4 sin 2x dx = [—4 cos 2x)" =0. (4) 
is zero 0 0 7 4 0 


This zero is no accident. It is highly important to science. The orthogonality goes beyond 
the two functions sin x and cos x, to an infinite list of sines and cosines. The list contains 


cos Ox (which is 1), sin%, cos x, sin 2x, cos 2x, sin 3x, cos 3x,.... 


Every function in that list is orthogonal to every other function in the list. 


Fourier Series 
The Fourier series of a function y(x) is its expansion into sines and cosines: 
YŒ) = ao + ay cosx + bi sinx + a.cos 2x + bo sin 2x ++. (5) 


We have an orthogonal basis! The vectors in “function space” are combinations of the sines 
and cosines. On the interval from x = 2x to x = 4x, all our functions repeat what they 
did from 0 to 27. They are “periodic.” The distance between repetitions is the period 27. 
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Remember: The list is infinite. The Fourier series is an infinite series. We avoided 
the vector v = (1,1,1,.. .) because its length is infinite, now we avoid a function like 
7 + cos x + cos 2x + cos 3x +---. (Note: This is x times the famous delta function (x). 
It is an infinite “spike” above a single point. At x = 0 its height $+ 1 +1 +-+- is infinite. 
At all points inside 0 < x < 27 the series adds in some average way to zero.) The integral 
of 8(x) is 1. But f 8?(x) = 00, so delta functions are excluded from Hilbert space. 

Compute the length of a typical sum f(x): 


2x 


P= (ao + a; cosx + bı sinx + az cos2x +---)? dx 
0 
2% 
= J, (ae + a? cos? x + b? sin? x + a3 cos* 2x +--+) ax 
fl? = 2ra? + x(a? +b? +a? +). (6) 


The step from line 1 to line 2 used orthogonality. All products like cos x cos 2x integrate to 
give zero. Line 2 contains what is left—the integrals of each sine and cosine squared. Line 
3 evaluates those integrals. (The integral of 17 is 27, when all other integrals give x.) If 
we divide by their lengths, our functions become orthonormal: 


Í cosx sinx cos2x 
Vn a'a Sa 
These are unit vectors. We could combine them with coefficients Ag, A1, By, A2,. . . to 
yield a function F(x). Then the 27 and the z’s drop out of the formula for length. 


,... és an orthonormal basis for our function space. 


Function length = vector length ||F ||? = (F, F) = A2+A74+ B24+ 434+---. (7) 


Here is the important point, for f(x) as well as F(x). The function has finite length exactly 
when the vector of coefficients has finite length. Fourier series gives us a perfect match 
between function space and infinite-dimensional Hilbert space. The function is in L?, its 
Fourier coefficients are in £7. 


The function space contains /(x)’exactly when the Hilbert space contains the vector 
ù = (ao; 4;1,51,.. - .) of Fourier coefficients. Both f(x) and v have finite length, > = 


Example 3 Suppose f(x) is a “square wave,” equal to 1 for 0 < x < x. Then f(x) 
drops to —1 for m < x < 2m. The +1 and —1 repeats forever. This f(x) is an odd 
function like the sines, and all its cosine coefficients are zero. We will find its Fourier 
series, containing only sines: 


(8) 


sinx + sin3x  sin5x + ] 
l 3 5 

. . 2. 2 2 
The length is v27, because at every point (f(x))° is (—1)* or (+1): 


2x 20 
IfI? = Í (fœ)? dx = Í ldx = 2x. 


4 
Square wave f(x)= =| 
X 
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At x = 0 the sines are zero and the Fourier series gives zero. This is half way up the jump 
from —1 to +1. The Fourier series is also interesting when x = 4. At this point the square 
wave equals 1, and the sines in (8) alternate between +1 and —1: 


4 1 i l 
la fi l=-(i-; -ip 
Formula for x = 3 + 57 + ) (9) 


Multiply by x to find a magical formula 4(1 — } + i — 1 + ---) for that famous number. 


The Fourier Coefficients 


How do we find the a’s and b’s which multiply the cosines and sines? For a given func- 
tion f(x), we are asking for its Fourier coefficients: 


Fourier series F(x) = do + a1 cos x + bı sin x + az c0s2x ++. 
Here is the way to find a,. Multiply both sides by cosx. Then integrate from 0 to 2x. 


The key is orthogonality! All integrals on the right side are zero, except for cos” x: 


20 2x 
Coefficient a; F(x) cosxdx = f a, cos? xdx = xa. (10) 
0 0 


Divide by x and you have a;. To find any other ag, multiply the Fourier series by cos kx. 
Integrate from 0 to 27. Use orthogonality, so only the integral of a, cos” kx is left. That 
integral is waz, and divide by x: 


agp = |}. fx)coskxdx and similarly dy ==.  F@)sinkxdx. (11) 


Jo 
The exception is ag. This time we multiply by cos 0x = 1. The integral of 1 is 27: 
2x 


Constant term ao = an f(x): ldx = average value of f(x). (12) 
0 


I used those formulas to find the Fourier coefficients for the square wave. The integral of 
f(x) cos kx was zero. The integral of f(x) sinkx was 4/k for odd k. 


Compare Linear Algebra in R” 


The point to emphasize is how this infinite-dimensional case is so much like the n-dimen- 
sional case. Suppose the nonzero vectors v1, ..., Un are orthogonal. We want to write the 
vector b (instead of the function f(x)) as a combination of those v’s: 


Finite orthogonal series b = c,v; + C202 + ++- + ChUn. (13) 
Multiply both sides by vT. Use orthogonality, so v! v2 = 0. Only the c; term is left: 
Coefficient c; vib =civivy+O+---+0. Therefore cı = vTb/vtvi. (14) 


The denominator viv, is the length squared, like x in equation 11. The numerator vib 
is the inner product like f f(x) coskx dx. Coefficients are easy to find when the basis 
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vectors are orthogonal. We are just doing one-dimensional projections, to find the compo- 
nents along each basis vector. 

The formulas are even better when the vectors are orthonormal. Then we have unit 
vectors. The denominators vivok are all 1. You know cp = vib in another form: 


C1 
Equation for c’s civi +++: +Cn0, =b or Vp © Un : | =b. 


The v’s are in an orthogonal matrix Q. Its inverse is QT. That gives the c’s: 
Oc=b yields c= Q'b. Row by row this is cg = qib. 


Fourier series is like having a matrix with infinitely many orthogonal columns. Those 
columns are the basis functions 1, cos x, sin x,. . .. After dividing by their lengths we have 
an “infinite orthogonal matrix.” Its inverse is its transpose. Orthogonality is what reduces 
a series of terms to one single term. 


Problem Set 8.5 
1 Integrate the trig identity 2 cos jx coskx = cos(j +k)x +cos(j —k)x to show that 


cos jx is orthogonal to cos kx, provided j 4 k. What is the result when j = k? 


2 Show that 1, x, and x? — ł are orthogonal, when the integration is from x = —1 to 
x = 1. Write f(x) = 2x? as a combination of those orthogonal functions. 


3 Find a vector (w1, W2, W3, . . .) that is orthogonal to v = (1, L, i, ...). Compute its 
length ||w]. 


4 The first three Legendre polynomials are 1, x, and x?— L, Choose c so that the fourth 
polynomial x? — cx is orthogonal to the first three. All integrals go from —1 to 1. 


5 For the square wave f(x) in Example 3, show that 


27 2x 20 


f(x)cosxdx =0 f(x)sinx dx = 4 f(x) sin2x dx = 0. 
0 o 
Which three Fourier coefficients come from those integrals? 
6 The square wave has || f ||? = 2x. Then (6) gives what remarkable sum for 12? 


7 Graph the square wave. Then graph by hand the sum of two sine terms in its series, 
or graph by machine the sum of 2, 3, and 10 terms. The famous Gibbs phenomenon 
is the oscillation that overshoots the jump (this doesn’t die down with more terms). 


8 Find the lengths of these vectors in Hilbert space: 


l 


w= (ede) 
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(b) v = (1,a,a7,...) 
(c) f(x) = 1+ sinx. 
Compute the Fourier coefficients a; and by for f(x) defined from 0 to 27: 
(a) f(x) =1for0<x <a, f(x) =0forn <x < 27 
(b) f(x) = x. 


When f(x) has period 27, why is its integral from —zr to x the same as from 0 to 
2x? If f(x) is an odd function, f(—x) = — f(x), show that fo” F(x) dx is zero. 
Odd functions only have sine terms, even functions have cosines. 


From trig identities find the only two terms in the Fourier series for f(x): 
(a) f(x) =cos?x (b) f(x) = cos(x + Z) (c) f(x) = sin? x 


The functions 1, cos x, sin x, cos 2x, sin2x,... are a basis for Hilbert space. Write 
the derivatives of those first five functions as combinations of the same five functions. 
What is the 5 by 5 “differentiation matrix” for these functions? 


Find the Fourier coefficients ag and bg of the square pulse F(x) centered at x = 0: 
F(x) = 1/h for |x| < h/2 and F(x) = 0 for h/2 < |x| < x. 


Ash — 0, this F(x) approaches a delta function. Find the limits of az and bg. 
The Fourier Series section 4.1 of Computational Science and Engineering explains 


the sine series, cosine series, complete series, and complex series Dcpeik* on 
math.mit.edu/cse. 
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8.6 Linear Algebra for Statistics and Probability 


Statistics deals with data, often in large quantities. Since data tends to go into rectangular 
matrices, we expect to see ATA. The least squares problem AF ~ b is linear regression. 
The best solution X fits m observations by n < m parameters. This is a fundamental 
application of linear algebra to statistics. 

This section goes beyond ATAF = A'b. These unweighted equations assume that the 
measurements b1, ... , bm are equally reliable. When there is good reason to expect higher 
accuracy (lower variance) in some b;, those equations should be weighted more heavily. 
With what weights wi, ++ , Wm ? And if the b; are not independent, a covariance matrix X 
gives the statistics of the errors. Here are key topics in this section: 


1. Weighted least squares and A'CAX¥ = A'Ch 
2. Variances o7,...,0,2, and the covariance matrix ÈE 
3. Important probability distributions: binomial, Poisson, and normal 


4. Principal Component Analysis (PCA) to find combinations with greatest variance. 


Weighted Least Squares 


To include weights in the m equations Ax = b, multiply each equation i by a weight w;. 
Put those m weights into a diagonal matrix W. We are replacing Ax = b by WAx = Wb. 
The equations are no more and no less solvable—we expect to use least squares. 

The least squares equation ATAF = ATb changes to (WA)"WAX = (WA)'Wb. 
The matrix C = WTW is inside (WA)' WA, in the middle of weighted least squares. 


Weighted © a, 
lea: t squares. _ a E 


AF? = A'Cb 


When n = | and A = column of 1’s, ¥ changes from an average to a weighted average: 


~ bite +b a wbi +- + w2b 

Simplest case xX = jn changes to fw = ee (2) 
Wy t+ + wh 

This average Xw gives greatest weight to the observations b; that have the largest wj. 

We always assume that errors have zero mean. (Subtract the mean if necessary, so there is 


no one-sided bias in the measurements.) 


How should we choose the weights w;? This depends on the reliability of b;. If that 
observation has variance ož, then the root mean square error in b; is oj. When we divide 
the equations by 61, ... om (left side together with right side), all variances will equal 1. 
So the weight is w; = 1/o; and the diagonal of C = WTW contains the numbers I / o?. 


The statistically correct matrix is C = diag (1/o7,..., 1/07). 
This is correct provided the errors e; and e; in different equations are statistically indepen- 


dent. If the errors are dependent, off-diagonal entries show up in the covariance matrix £. 
The good choice is still C = =~ as described in this section. 
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Mean and Variance 
The two crucial numbers for a random variable are its mean m and its variance o?. 
The “expected value” Ele] is found from the probabilities pı, p2,... of the possible 
errors €1, €2,... (and the variance o? is always measured around the mean). 


For a discrete random variable, the error e; has probability p; (the p; add to 1): 


“Mean 1 m= = ie = Don : ‘Variance o? 2E = Elle =m?) = = De -m?; Pi. i © 


Example 1 Flip a fair coin. The result is 1 (for heads) or O (for tails). Those events have 
equal probabilities pọ = pı = 1/2. The mean is m = 1/2 and the variance is o? = 1/4: 


1 l 1\71 1\71 1 
Mean = (0) - + (1) = Vari ={0-— =] = 1-~|) -=-. 
ean = (0) 5 + ( )5 ariance (0 5) 5 +( 5) = 4 
Example 2 (Binomial) Flip the fair coin N times and count heads. With 3 flips, we 
see M = 0,1,2, or 3 heads. The chances are 1/8, 3/8, 3/8, 1/8. There are three ways 
to see M = 2 heads: HHT, HTH, and THH, and only HHH for M = 3 heads. 


For all N, the number of ways to see M heads is the binomial coefficient “N choose M”. 
Divide by the total number 2% of all possible outcomes to get the probability for each M: 


M heads in 1 fN\_ 1 N! Check -L 3! _ 3 4) 
N coin flips P™ ~ oN (M) T 2 MiN—M)! 23 211! 8 


Gamblers know this instinctively. The probabilities pm add to (3 + 1)” = = 1. The mean 
value of the number of heads is m = N/2. The variance around m turns out to be o? = 


N/4. The standard deviation o = ~ N /2 measures the expected spread around the mean. 


Example 3 (Poisson) A very unfair coin (small p << 5) is flipped very often (large N). 
The product A = pN is kept fixed. The high probability of tails is 1 — p each time. 
So the chance po of no heads in N flips (tails every time) is (1 — p) = (1 — A/N)”. 
For large N this approaches e~*. The probability p; of j heads in N very unfair flips 
comes out neatly in terms of the crucial number 1 = pN: 


Poisson applies to counting infrequent events (low p) over along time T. Then à = pT. 


A continuous random variable will have a probability density function p(x) instead 
of pi, p2,.... “An outcome between x and x + dx has probability p(x) dx.” The total 
probability is {p(x) dx = 1, since some outcome must happen. Sums become integrals: 


Mean m = Expected value = J xp(x)dx Variance o? = f (x —m)? p(x) dx. (6) 
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The outstanding example of a probability density function p(x) (called the pdf) is the 
normal distribution N(0, 0o). This has mean zero by symmetry. Its variance is 07: 


2 /oq2 
eX [20 


Normat 1 (Gaussian) p(x) = = : he J p(x)dx = 1. O 
EE A AE om © = anei ee ce BES 


The graph of p(x) is the famous bell-shaped curve. The integral of p(x) from —o to o 
is the probability that a random sample is less than one standard deviation o from the 
mean. This is near 2/3. MATLAB’s randn uses the normal distribution with o = 1. 

This normal p(x) appears everywhere because of the Central Limit Theorem: The 
average Over many independent trials of another distribution (like binomial) will approach 
a normal distribution as N — oo. A shift produces m = 0 and rescaling produces o = 1. 


M — M —N/2 
Normalized headcount x = Z m MNP — Normal N(0, 1). 


o JN /2 


The Covariance Matrix 


Now run m different experiments at once. They might be independent, or there might be 
some correlation between them. Each measurement 8 is now a vector with m components. 
Those components are the outputs b; from the m experiments. 

If we measure distances from the means m;, each error e; = b; — m; has mean zero. 
If two errors e; and e; are independent (no relation between them), their product e;e; 
also has mean zero. But if the measurements are by the same observer at nearly the same 
time, the errors e; and e; could tend to have the same sign or the same size. The errors 
in the m experiments could be correlated. The products eje; are weighted by p;; (their 
probability): covariance o;j = >- > pijeie;. The sum of e? pi; is the variance o?: 


Covariance’ 01; = oj; = Eleje;] = expected value of (e; times ej). `` 


This is the (7, 7) and (j,i) entry of the covariance matrix £. The (i, i) entry is oj; = ož. 


Example 4 (Multivariate normal) For m random variables, the probability density 
function moves from p(x) to p(b) = p(b1,...,m). The normal distribution with mean 
zero was controlled by one positive number 07. Now p(b) is controlled by an m by m 
positive definite matrix £. This is the covariance matrix and its determinant is |È |: 


pb = On eE * woes 


The integral of p(b) over m-dimensional space is 1. The integral of bb" p(b) is E. 
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The good way to handle that exponent —bTET!b/2 is to use the eigenvalues and or- 
thonormal eigenvectors of © (linear algebra enters here). When X = QAQ! = QAQ™!, 
replacing & by Qe will split p(b) into m one-dimensional normal distributions: 


exp (-b7~15/2) = exp (-c7A~1e/2) = (e7112) zE (ecm 2m) . 


The determinant has |£ |1/2 = [A |1? = (A; ++- Àm) 2. Each integral over —o0 < ci <00 is 
back to one dimension, where à = o?. Notice the wonderful fact that after any linear 
transformation (here e = Q~'b), we still have a multivariate normal distribution. 

We could even reach variances = 1 by including V/A in the change from b to z: 


T 
Standard e77 2/2 
normal b = VAQz changes p(b)db to p(z)dz = nya? 


This tells us the right weight matrix W to bring Ax = b back to ordinary least squares 
for WAx = Wb. We want Wb to become the standard normal z. So W will be the inverse 
of VAQ. Better than that, C = WTW is the inverse of QAQ" which is £. 


Summary For independent errors, © is the diagonal matrix diag(o?,...,02,). This is 
the usual choice. The right weights w; for the equations Ax = b are 1/oj,...,1/om 
(this will equalize all variances to 1). The right matrix C = WTW in the middle of the 
weighted least squares equations is exactly £7!: 


_ Weighted least squares 9 = ATETIAR = ATED. o oo O 


This choice of weighting returns Ax = b to a least squares problem WAx = Wb with 
equally reliable and independent errors. The usual equation (WA)'WAX = (WA) Wb 
is the same as (9). 

It was Gauss who found this best linear unbiased estimate X. Unbiased because the 
mean of x — ¥ is zero, linear because of equation (9), best because the covariance of x — £ 
is as small as possible. That covariance (for error in F, not error in b!) is important: 


‘Covariance of t e best? OP =E|@ -8-8 ] = (ATE). Se 0). 


Example 5 Your pulse rate is measured ten times by independent doctors, all equally 
reliable. The mean error of each b; is zero, and each variance is o7. Then © = g?l. 
The ten equations x = b; produce the 10 by 1 matrix A of all ones. The best estimate ¥ 
is the average of the ten b;. The variance of that average value X is the number P: 


P = (ATETA! =07/10 so averaging reduces the variance. 


This matrix P = (A™Z~ A)! tells how reliable is the result ¥ of the experiment 
(Problem 6), P does not depend on the b’s in the actual experiment! Those b’s have 
probability distributions. Each experiment produces a sample value of ¥ from a sample b. 
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When a small £ gives good reliability of the inputs b, a small P gives good reliability 
of the outputs £. The key formula P = (ATE ~!A)~! connects those covariances. 


Principal Component Analysis 


These paragraphs are about finding useful information in a data matrix A. Start by mea- 
suring m properties (m features) of n samples. These could be grades in m courses for 
n students (a row for each course, a column for each student). From each row, subtract 
its average so the sample means are zero. We look for a combination of courses and/or 
combination of students for which the data provides the most information. 

Information is “distance from randomness” and it is measured by variance. A large 
variance in course grades means greater information than a small variance. 

The key matrix idea is the Singular Value Decomposition A = UEVT. We are back 
again to ATA and AAT, because their unit eigenvectors are the singular vectors ¥1,..., Un 
in V and w,,...,W, in U. The singular values in the diagonal matrix È (not the covari- 
ance) are in decreasing order and g; is the most important. Weighting the m courses by 
the components of wz; gives a “master course” or “eigencourse” with the most significant 
grades. 


Example 6 Suppose the grades A, B, C, F are worth 4, 2,0, —6 points. If each course 
and each student has one of each grade, then all means are zero. Here is the grade matrix 
A with (1, 1, 1, 1) in its nullspace (rank 3). To keep integers, the SVD of A will be written 
as 2U times &/4 times (2V)'. So the o’s are 12,8, 4: 


-6 2 0 4 —1 i —1 
0 4-6 2/_f-1 -1 aal, J td 
4 0 2 -6]— 1 —1 -l i | -1-1 1 
2-6 4 0 1 1 1 
Weighting the rows (the courses) by uy = $(-1,-1, 1,1) will give the eigencourse. 


Weighting the columns (the students) by vı = $(1, —1, 1,—1) gives the eigenstudent. 
The fraction of the grade matrix that is “explained” by that one course and student is 
oa? /(a? + 02 + 02) = 9/14. The o’s in the SVD are the variances o°. 


I guess this master course is what a Director of Admissions is looking for. If all grades 
in gym are the same, that row of A will be all zero—and gym is not part of the master 
course. Probably calculus is a part, but what about students who don’t take calculus? The 
problem of missing data (holes in the matrix A) is extremely difficult for social sciences 
and the census and so much of the statistics of experiments. 


Gene expression data Determining the functions of genes, and combinations of genes, 
is a central problem of genetics. Which genes combine to give which properties? Which 
genes malfunction to give which diseases? 

We now have an incredibly fast way to find gene expression data in the lab. A gene 
microarray is often packed onto an Affymetrix chip, measuring tens of thousands of genes 
from one sample (one person). The understanding of genetic data (bioinformatics) has 
become a tremendous application of linear algebra. 
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Problem Set 8.6 


1 Which line Ct + D is the best fit to the three independent measurements 1, 2, 4 at 
2 


times ¢ = 0, 1, 2 if the variances o7, ož, o$ are 1, 1,2? Use weights w; = 1/0;. 

2 In Problem 1, suppose that the third measurement is totally unreliable. The variance 
o% becomes infinite. Then the best line will not use . Find the line that goes 
through the first two points and solves the first two equations in Ax = b exactly. 


3 In Problem 1, suppose that the third measurement is totally reliable. The variance o? 
approaches zero. Now the best line will go through the third point exactly. 
Choose that line to minimize the sum of squares of the first two errors. 


4 A single flip of a fair coin (0 or 1) has mean m = 1/2 and variance o? = 1/4. This 
was Example 1. For the sum of two flips, the mean is m = 1. Compute the variance 
o? around this mean, using the outcomes 0, 1, 2 with their probabilities. 


5 Instead of adding the flip results, make them two independent experiments. The 
‘outcome is (0, 0), (1, 0), (0, 1) or (1, 1). What is the covariance matrix £? 


6 Change Example 1 so that the coin flip can be unfair. The probability is p for heads 
and 1 — p for tails. Find the mean m and the variance ø? of this distribution. 


7 For two independent measurements x = bı and x = bz, the best ¥ should be some 
weighted average ¥X = ab, + (1 — a)bı. When bı and b have mean zero and 
variances o? and o, the variance of ¥ will be P = a?o? + (1 —a)*o#. Choose the 
number a that minimizes P: dP /da = 0. 

Show that this a gives the ¥ in equation (2) which the text claimed is best, using 
weights w; = 1/01 and w2 = 1/02. 


8 The least squares estimate correctly weighted by E~! is = (ATH 1A)—-! ATE, 
Call that x = Lb. If b contains an error vector e, then X contains the error Le. 


The covariance matrix of those output errors Le is their expected value (average 
value) P = E[(Le)(Le)"] = LE[ee™| LT = LEL". Problem: Do the multipli- 
cation LE LT to show that P equals (ATE T! A)! as predicted in equation (10). 


9 Change the grades to 3, 1,—1,—3 for A, B, C, F. Show that the SVD of this grade 
matrix has the same x1, u2, V1, V2 (Same eigencourses) as in Example 5, but now A 


has rank 2. 
3 —l 1 -—3 
. |i 3 —3 1 
Grade matrix A= 3 i 3 
1 —3 3 —l 


Notes One way to deal with missing entries in A is to complete the matrix to have 
minimum rank. And statistics makes major use of the pseudoinverse At (which is 
exactly the left inverse (ATA)! AT from the normal equation when ATA is invert- 
ible). 
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8.7 Computer Graphics 


Computer graphics deals with images. The images are moved around. Their scale is changed. 
Three dimensions are projected onto two dimensions. AII the main operations are done by 
matrices—but the shape of these matrices is surprising. 

The transformations of three-dimensional space are done with 4 by 4 matrices. You 
would expect 3 by 3. The reason for the change is that one of the four key operations 
cannot be done with a 3 by 3 matrix multiplication. Here are the four operations: 


Translation (shift the origin to another point Po = (Xo, Yo, Zo)) 
Rescaling (by c in all directions or by different factors c;,c2,¢3) 

Rotation (around an axis through the origin or an axis through Po) 
Projection (onto a plane through the origin or a plane through Pp). 
Translation is the easiest—just add (xo, yo, Zo) to every point. But this is not linear! No 3 
by 3 matrix can move the origin. So we change the coordinates of the origin to (0, 0, 0, 1). 


This is why the matrices are 4 by 4. The “homogeneous coordinates” of the point (x, y, Z) 
are (x, y, Z, 1) and we now show how they work. 


1. Translation Shift the whole three-dimensional space along the vector vo. The origin 
moves to (Xo, Yo, Zo). This vector vo is added to every point v in R*. Using homogeneous 
coordinates, the 4 by 4 matrix T shifts the whole space by vo: 


Important: Computer graphics works with row vectors. We have row times matrix instead 
of matrix times column. You can quickly check that [0 0 0 1]T = [xo yo Zo 1]. 

To move the points (0,0, 0) and (x, y, z) by vo, change to homogeneous coordinates 
(0, 0,0, 1) and (x, y, z, 1). Then multiply by T. A row vector times T gives a row vector. 
Every v moves tov + vo: [x y z 1]T = [x+ xo y+ yo Zz +zo 1]. 

The output tells where any v will move. (It goes to v-+ v9.) Translation is now achieved 
by a matrix, which was impossible in R°. 


2. Scaling To make a picture fit a page, we change its width and height. A Xerox copier 
will rescale a figure by 90%, In linear algebra, we multiply by .9 times the identity matrix. 
That matrix is normally 2 by 2 for a plane and 3 by 3 for a solid. In computer graphics, 
with homogeneous coordinates, the matrix is one size larger: 


Rescale the plane: Rescale a solid: S = 


ooosn 
oon © 
ona oom 
re OOS 


460 Chapter 8. Applications 


Important: S is not cI. We keep the “1” in the lower comer. Then [x, y, 1] times S is the 
correct answer in homogeneous coordinates. The origin stays in its normal position because 
(00 1]S = [00 1]. 

If we change that 1 to c, the result is strange. The point (cx,cy,cz,c) is the same 
as (x, y,Z, 1). The special property of homogeneous coordinates is that multiplying by cI 
does not move the point. The origin in R° has homogeneous coordinates (0, 0,0, 1) and 
(0, 0, 0, c) for every nonzero c. This is the idea behind the word “homogeneous.” 

Scaling can be different in different directions. To fit a full-page picture onto a half- 
page, scale the y direction by 4. To create a margin, scale the x direction by 3. The 
graphics matrix is diagonal but not 2 by 2. It is 3 by 3 to rescale a plane and 4 by 4 to 
rescale a space: 


C1 
Scaling matrices S = 


C3 
1 


That last matrix § rescales the x, y, z directions by positive numbers c1,¢2,c3. The extra 
column in all these matrices leaves the extra 1 at the end of every vector. 


Summary The scaling matrix § is the same size as the translation matrix T. They can 
be multiplied. To translate and then rescale, multiply v7'S. To rescale and then translate, 
multiply vST. Are those different? Yes. 


The point (x, y, z) in R? has homogeneous coordinates (x, y, z, 1) in P?. This “pro- 
jective space” is not the same as R4. It is still three-dimensional. To achieve such a thing, 
(cx,cy,cZ,C) is the same point as (x,y,z, 1). Those points of projective space P? are 
really lines through the origin in R*. 

Computer graphics uses affine transformations, linear plus shift. An affine transforma- 
tion T is executed on P? by a 4 by 4 matrix with a special fourth column: 


Q11 @j2 a33 0 T(1, 0,0) 0 

A= a2; a22 423 0 _ T (0, 1, 0) 0 
7 | az, 432 433 Oo| T(0, 0, 1) 0 
Q4, aa2 43 1 T(0, 0, 0) 1 


The usual 3 by 3 matrix tells us three outputs, this tells four. The usual outputs come 
from the inputs (1,0, 0) and (0, 1,0) and (0, 0, 1). When the transformation is linear, three 
outputs reveal everything. When the transformation is affine, the matrix also contains the 
output from (0, 0,0). Then we know the shift. 


3. Rotation A rotation in R? or R? is achieved by an orthogonal matrix Q. The determi- 
nant is +1. (With determinant —1 we get an extra reflection through a mirror.) Include the 
extra column when you use homogeneous coordinates! 


Plane rotation Q 


cos@ —sin@ 
sinf cos@ 


| becomes R =| s ind 
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This matrix rotates the plane around the origin. How would we rotate around a 
different point (4,5)? The answer brings out the beauty of homogeneous coordinates. 
Translate (4,5) to (0,0), then rotate by 0, then translate (0,0) back to (4, 5): 


1 0 O|ffcos@ -sn@ O}]1 0 0 
vT_RTz=[x y 1]] 0 1 Of] sin@ cos@ O}|0 1 O 
-4 -5 | 0 0 l 4 5 1 


I won’t multiply. The point is to apply the matrices one at a time: v translates to v7_, then 
rotates to vT- R, and translates back to vT- RT. Because each point [ x y l ] is a row 
vector, T acts first. The center of rotation (4, 5)—otherwise known as (4, 5, 1)—moves 
first to (0,0, 1). Rotation doesn’t change it. Then 7 moves it back to (4,5, 1). All as it 
should be. The point (4, 6, 1) moves to (0, 1, 1), then tums by @ and moves back. 

In three dimensions, every rotation Q turns around an axis. The axis doesn’t move—it 
is a line of eigenvectors with A = 1. Suppose the axis is in the z direction. The 1 in Q is 
to leave the z axis alone, the extra 1 in R is to leave the origin alone: 


cos —sinO 0 Q A 

O = | sinf  cosô 0 and R= 0 
0 0 1 

0 00 1 


Now suppose the rotation is around the unit vector a = (@1,d2,a3). With this axis a, the 
rotation matrix Q which fits into R has three parts: 


2 
ay did2 a a3 0 az —da2 
Q = (cos0)I + (1 — cos) | aaz a a3 |-sin0 |-a3 0 ay |. © 
ajaz a2a3, a az —a, 0 


The axis doesn’t move because aQ = a. When a = (0,0, 1) is in the z direction, this Q 
becomes the previous Q—for rotation around the z axis. 

The linear transformation Q always goes in the upper left block of R. Below it we see 
zeros, because rotation leaves the origin in place. When those are not zeros, the transfor- 
mation is affine and the origin moves. 


4, Projection In a linear algebra course, most planes go through the origin. In real life, 
most don’t. A plane through the origin is a vector space. The other planes are affine spaces, 
sometimes called “flats.” An affine space is what comes from translating a vector space. 

We want to project three-dimensional vectors onto planes. Start with a plane through 
the origin, whose unit normal vector is n. (We will keep n as a column vector.) The vectors 
in the plane satisfy n'v = 0. The usual projection onto the plane is the matrix I — nn". 
To project a vector, multiply by this matrix. The vector is projected to zero, and the 
in-plane vectors v are projected onto themselves: 


(I — nnn =n—n(n'n) =0 and (J —nn)v =v—n(n'v) = v. 
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In homogeneous coordinates the projection matrix becomes 4 by 4 (but the origin doesn’t 


move): 


Now project onto a plane nT (v — vo) = 0 that does not go through the origin. One point on 
the plane is vo. This is an affine space (or a flat). It is like the solutions to Av = b when 
the right side is not zero. One particular solution vo is added to the nullspace—to produce 
a flat. 

The projection onto the flat has three steps. Translate vo to the origin by T_. Project 
along the n» direction, and translate back along the row vector vo: 


_ npl 
Projection onto a flat TPT, = E i] | o i| K i| . 


I can’t help noticing that T_ and T4 are inverse matrices: translate and translate back. They 
are like the elementary matrices of Chapter 2. 


The exercises will include reflection matrices, also known as mirror matrices. These 
are the fifth type needed in computer graphics. A reflection moves each point twice as far 
as a projection—the reflection goes through the plane and out the other side. So change 
the projection J — nn" to I — 2nn? for a mirror matrix. 

The matrix P gave a “parallel” projection. All points move parallel to n, until they 
reach the plane. The other choice in computer graphics is a “perspective” projection. This 
is more popular because it includes foreshortening. With perspective, an object looks larger 
as it moves closer. Instead of staying parallel to n (and parallel to each other), the lines of 
projection come toward the eye—the center of projection. This is how we perceive depth 
in a two-dimensional photograph. 


The basic problem of computer graphics starts with a scene and a viewing position. Ideally, 
the image on the screen is what the viewer would see. The simplest image assigns just one 
bit to every small picture element—called a pixel. It is light or dark. This gives a black 
and white picture with no shading. You would not approve. In practice, we assign shading 
levels between 0 and 28 for three colors like red, green, and blue. That means 8 x 3 = 24 
bits for each pixel. Multiply by the number of pixels, and a lot of memory is needed! 

Physically, a raster frame buffer directs the electron beam. It scans like a television 
set. The quality is controlled by the number of pixels and the number of bits per pixel. 
In this area, one standard text is Computer Graphics: Principles and Practices by Foley, 
Van Dam, Feiner, and Hughes (Addison-Wesley, 1995). The newer books still use ho- 
mogeneous coordinates to handle translations. My best references were notes by Ronald 
Goldman and by Tony DeRose. 
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= REVIEW OF THE KEY IDEAS = 


1. Computer graphics needs shift operations T (v) = v-+vo as well as linear operations 


T(v) = Av. 


2. A shift in R” can be executed by a matrix of order n + 1, using homogeneous coor- 


dinates. 


3. The extra component 1 in [x y z 1] is preserved when all matrices have the numbers 


0, 0, 0, 1 as last column. 


Problem Set 8.7 


1 


10 


A typical point in R? is xi + yj + zk. The coordinate vectors i, j, and k are 
(1,0, 0), (0, 1, 0), (0, 0, 1). The coordinates of the point are (x, y, z). 


This point in computer graphics is xi + yj + zk + origin. Its homogeneous coor- 
dinates are( , , , ). Other coordinates for the same point are ( , , , ). 


A linear transformation T is determined when we know T (i), T(J), T(k). For an 
affine transformation we also need T( ). The input point (x, y, z, 1) is trans- 
formed to xT (i) + yT(J) + zT(k) + . 


Multiply the 4 by 4 matrix T for translation along (1, 4,3) and the matrix 7, for 
translation along (0, 2,5). The product T 7; is translation along . 


Write down the 4 by 4 matrix S that scales by a constant c. Multiply ST and also 
TS, where T is translation by (1,4,3). To blow up the picture around the center 
point (1, 4,3), would you use vST or vTS? 


What scaling matrix S (in homogeneous coordinates, so 3 by 3) would produce a 
1 by 1 square page from a standard 8.5 by 11 page? 


What 4 by 4 matrix would move a corner of a cube to the origin and then multiply 
all lengths by 2? The corner of the cube is originally at (1, 1, 2). 


When the three matrices in equation 1 multiply the unit vector a, show that they give 
(cos 0)a and (1 — cos @)a and 0. Addition gives aQ = a and the rotation axis is not 
moved. 


If b is perpendicular to a, multiply by the three matrices in 1 to get (cos 0)b and 0 
and a vector perpendicular to b. So Qb makes an angle @ with b. This is rotation. 


What is the 3 by 3 projection matrix 7 — nn" onto the plane 2x + Z y+ iz = 0? In 
homogeneous coordinates add 0, 0, 0, 1 as an extra row and column in P. 


With the same 4 by 4 matrix P, multiply T-P T4. to find the projection matrix onto 
the plane 2x-+%y+4z = 1. The translation T- moves a point on that plane (choose 


one) to (0, 0,0, 1). The inverse matrix T} moves it back. 
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Project (3, 3,3) onto those planes. Use P in Problem 9 and T_ P T, in Problem 10. 
If you project a square onto a plane, what shape do you get? 


If you project a cube onto a plane, what is the outline of the projection? Make the 
projection plane perpendicular to a diagonal of the cube. 


The 3 by 3 mirror matrix that reflects through the plane nT = Ois M = J — 2an". 
Find the reflection of the point (3, 3, 3) in the plane 2x + 2 yr iz = 0. 


Find the reflection of (3,3, 3) in the plane 2x + 2 y+ iz = |. Take three steps 
T_M Ty, using 4 by 4 matrices: translate by T_ so the plane goes through the origin, 
reflect the translated point (3, 3, 3, 1)T_ in that plane, then translate back by T4. 


The vector between the origin (0, 0,0, 1) and the point (x, y, z, 1) is the difference 
v= . In homogeneous coordinates, vectors end in . So we adda 
to a point, not a point to a point. 


If you multiply only the last coordinate of each point to get (x, y, Z, c), you rescale 
the whole space by the number . This is because the point (x, y, z,c) is the 
sameas(, , ,l). 


Chapter 9 


Numerical Linear Algebra 


9.1 Gaussian Elimination in Practice 


Numerical linear algebra is a struggle for quick solutions and also accurate solutions. We 
need efficiency but we have to avoid instability. In Gaussian elimination, the main freedom 
(always available) is to exchange equations. This section explains when to exchange rows 
for the sake of speed, and when to do it for the sake of accuracy. 

The key to accuracy is to avoid unnecessarily large numbers. Often that requires us to 
avoid small numbers! A small pivot generally means large multipliers (since we divide by 
the pivot). A good plan is “partial pivoting”, to choose the largest candidate in each new 
column as the pivot. We will see why this strategy is built into computer programs. 

Other row exchanges are done to save elimination steps. In practice, most large matrices 
are sparse—almost all entries are zeros. Elimination is fastest when the equations are 
ordered to put those zeros (as far as possible) outside the band of nonzeros. Zeros inside 
the band “fill in” during elimination—the zeros are destroyed and don’t help. 

Section 9.2 is about instability that can’t be avoided. It is built into the problem, and 
this sensitivity is measured by the “condition number”. Then Section 9.3 describes how to 
solve Ax = b by iterations. Instead of direct elimination, the computer solves an easier 
equation many times. Each answer x, leads to the next guess x41. For good iterations, 
like conjugate gradients, the x, converge quickly to x = A7!b. 


The Fastest Supercomputer 


A new supercomputing record was announced by IBM and Los Alamos on May 20, 2008. 
The Roadrunner was the first to achieve a quadrillion (10!°) floating-point operations per 
second: a petaflop machine. The benchmark for this world record was a large dense linear 
system Ax = b: linear algebra. 

The LINPACK software does elimination with partial pivoting. The biggest difference 
from this book is to organize the steps to use large submatrices and never single numbers. 
Roadrunner is a multicore Linux cluster with very remarkable processors, based on the 
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Cell Broadband Engine from Sony’s PlayStation 3. The market for video games dwarfs 
scientific computing and led to astonishing acceleration in the chips. 

This path to petascale is not the approach taken by IBM’s BlueGene. A key issue was to 
count the standard quad-core processors that a petaflop machine would need: 32,000. The 
new architecture uses much less power, but its hybrid design has a price: a code needs three 
separate compilers and explicit instructions to move all the data. Please see the excellent 
article in STAM News (siam.org, July 2008) and the details on www.lanl.gov/roadrunner. 

The TOP500 project ranks the most powerful computer systems in the world. Road- 
runner and BlueGene are #1 and #2 as this page is written in 2009. 

Our thinking about matrix calculations is reflected in the highly optimized BLAS 
(Basic Linear Algebra Subroutines). They come at levels 1, 2, and 3: 


1 Linear combinations of vectors au + v: O(n) work 
2 Matrix-vector multiplications Au + v: O(n”) work 
3 Matrix-matrix multiplications AB + C: O(n?) work 


Level 1 is a single elimination step (multiply row j by £;; and subtract from row į). Level 2 
can eliminate a whole column at once. A high performance solver is rich in Level 3 BLAS 
(AB has 2n? flops and 2n? data, a good ratio of work to talk). 

It is data passing and storage retrieval that limit the speed of parallel processing. The 
high-velocity cache between main memory and floating-point computation has to be fully 
used! Top speed demands a block matrix approach to elimination. 

The big change, coming now, is parallel processing at the chip level. 


Roundoff Error and Partial Pivoting 


Up to now, any pivot (nonzero of course) was accepted. In practice a small pivot is danger- 
ous. A catastrophe can occur when numbers of different sizes are added. Computers keep a 
fixed number of significant digits (say three decimals, for a very weak machine). The sum 
10,000 + 1 is rounded off to 10,000. The “1” is completely lost. Watch how that changes 
the solution to this problem: 


O00lu +v = 1 


starts with coefficient matrix A= 0001 1 . 
—u +v =0 —1 


l 
If we accept .0001 as the pivot, elimination adds 10,000 times row 1 to row 2. Roundoff 
leaves 

10,000v = 10,000 instead of 10,001v = 10,000. 


The computed answer v = 1 is near the true v = .9999, But then back substitution puts 
the wrong v into the equation for u: 


1.000. By losing the “1” in the matrix, we have lost the solution. The change from 10,001 
to 10,000 has changed the answer from u = 1 to u = 0 (100% error!). 
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If we exchange rows, even this weak computer finds an answer that is correct to three 
places: 


—u+v=0 _, —u+v=0 SN u=1 
.000lu +v = 1 v=l v=l. 
The original pivots were .0001 and 10,000—badly scaled. After a row exchange the exact 
pivots are —1 and 1.0001—-well scaled. The computed pivots —1 and 1 come close to the 
exact values. Small pivots bring numerical instability, and the remedy is partial pivoting. 
The kth pivot is decided when we reach and search column k: 


Choose the largest number in row k or below. Exchange its row with row k. 


The strategy of complete pivoting looks also in later columns for the largest pivot. It ex- 
changes columns as well as rows. This expense is seldom justified, and all major codes 
use partial pivoting. Multiplying a row or column by a scaling constant can also be very 
worthwhile. If the first equation above is u + 10,000v = 10,000 and we don’t rescale, 
then 1 looks like a good pivot and we would miss the essential row exchange. 

For positive definite matrices, row exchanges are not required. It is safe to accept 
the pivots as they appear. Small pivots can occur, but the matrix is not improved by row 
exchanges. When its condition number is high, the problem is in the matrix and not in the 
code. In this case the output is unavoidably sensitive to the input. 


The reader now understands how a computer actually solves Ax = b—by elimination 
with partial pivoting. Compared with the theoretical description—jfind A`! and multiply 
A~!b—the details took time. But in computer time, elimination is much faster. I believe 
this algorithm is also the best approach to the algebra of row spaces and nullspaces. 


Operation Counts: Full Matrices and Band Matrices 


Here is a practical question about cost. How many separate operations are needed to solve 
Ax = b by elimination? This decides how large a problem we can afford. 

Look first at A, which changes gradually into U. When a multiple of row 1 is subtracted 
from row 2, we do n operations. The first is a division by the pivot, to find the multiplier £. 
For the other n — 1 entries along the row, the operation is a “multiply-subtract”. For conve- 
nience, we count this as a single operation. If you regard multiplying by £ and subtracting 
from the existing entry as two separate operations, multiply all our counts by 2. 

The matrix A is n by n. The operation count applies to all n — 1 rows below the first. 
Thus it requires 7 times n — 1 operations, or n? — n, to produce zeros below the first pivot. 
Check: All n? entries are changed, except the n entries in the first row. 

When elimination is down to k equations, the rows are shorter. We need only k? — k 
operations (instead of n? — n) to clear out the column below the pivot. This is true for 
1<k <n. The last step requires no operations (1? — 1 = 0), since the pivot is set and 
forward elimination is complete. The total count to reach U is the sum of k? — k over all 
values of k from 1 ton: 


n(n + 1)(2n + 1) B nati) _ : 


2 2 
Lee —([+.-:- — 
(17 +---+n*)-—(1+---+2) 6 5 
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Those are known formulas for the sum of the first n numbers and the sum of the first n 
squares. Substituting n = 1 into n? — n gives zero. Substituting n = 100 gives a million 
minus a hundred—then divide by 3. (That translates into one second on a workstation.) 
We will ignore the last term n in comparison with the larger term n°, to reach our main 
conclusion: 


The multiply-subtract count for forward elimination (A to U, producing L) is n>. 


That means ln? multiplications and tn? subtractions. Doubling n increases this cost by 
eight (because n is cubed). 100 equations are easy, 1000 are more expensive, 10000 dense 
equations are close to impossible. We need a faster computer or a lot of zeros or a new 
idea. 

On the right side of the equations, the steps go much faster. We operate on single 
numbers, not whole rows. Each right side needs exactly n? operations. Down and back 
up we are solving two triangular systems, Le = b forward and Ux = ¢ backward. In back 
substitution, the last unknown needs only division by the last pivot. The equation above 
it needs two operations—substituting x, and dividing by its pivot. The kth step needs k 
multiply-subtract operations, and the total for back substitution is 

l+2+--+n = men a in? operations. 
The forward part is similar. The n? total exactly equals the count for multiplying A7~'b! 
This leaves Gaussian elimination with two big advantages over A~!b: 


Band Matrices 


These counts are improved when A has “good zeros”. A good zero is an entry that remains 
zero in L and U. The best zeros are at the beginning of a row. They require no elimination 
steps (the multipliers are zero). So we also find those same good zeros in L. That is 
especially clear for this tridiagonal matrix A: 


Tridiagonal 1 —i 1 1 -1 
Bidiagonal -l1 2-1 _{-1 1l 1 —] 
times -1 2 -1| -1 1 1 -i 
bidiagonal -l 2 —] l l 


Rows 3 and 4 of A begin with zeros. No multiplier is needed, so L has the same zeros. 
Also columns 3 and 4 start with zeros. When a multiple of row 1 is subtracted from row 2, 
no calculation is required beyond the second column. The rows are short. They stay short! 
Figure 9.1 shows how a band matrix A has band factors L and U. 
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Figure 9.1: A = LU for a band matrix. Good zeros in A stay zero in L and U. 
These zeros lead to a complete change in the operation count, for “half-bandwidth” w: 
A band matrix has aj; = 0 when |i — j | > w. 


Thus w = | for a diagonal matrix, w = 2 for tridiagonal, w = n for dense. The length of 
the pivot row is at most w. There are no more than w — 1 nonzeros below any pivot. Each 
stage of elimination is complete after w(w— 1) operations, and the band structure survives. 
There are n columns to clear out. Therefore: 


Elimination on a band matrix (A to L and U) needs less than w?n operations. 


For a band matrix, the count is proportional to n instead of 3. It is also proportional to w?. 
A full matrix has w = n and we are back to n°. For an exact count, remember that the 
bandwidth drops below w in the lower right corner (not enough space): 


—1 — — 3 
w(w nx 2w + 1) Dense n(n ae +1) _f - n 


On the right side, to find x from b, the cost is about 2w7 (compared to the usual n2). Main 
point: For a band matrix the operation counts are proportional to n. This is extremely fast. 
A tridiagonal matrix of order 10,000 is very cheap, provided we don’t compute A~'. That 
inverse matrix has no zeros at all: 


Band 


1-1 0 0 4321 
_|-1 2-1 0 a ppe-t7-1__ {3 3 21 
A=! ọọ 2 -y| Bs 4 5U L =), 4 9 4 

0 0-1. 2 i111 


We are actually worse off knowing A~! than knowing L and U. Multiplication by AT! 
needs the full n? steps. Solving Le = b and Ux = c needs only 2wn. A band structure 
is very common in practice, when the matrix reflects connections between near neighbors: 
4,3 = Oand a14 = 0 because 1 is not a neighbor of 3 and 4. 

We close with counts for Gauss-Jordan and Gram-Schmidt-Householder: 


- AT! costs n? multiply-subtract steps. EEG, . : ; QR costs 27° steps. Pon 


Start with AAT! = I. The jth column of A7! solves Ax ; = jth column of J. The left 


side costs in? as usual. (This is a one-time cost! L and U are not repeated.) The special 


470 Chapter 9. Numerical Linear Algebra 


saving for the jth column of J comes from its first j — 1 zeros. No work is required on the 
right side until elimination reaches row j. The forward cost is i(n — j)? instead of 5n?. 
Summing over j, the total for forward elimination on the n right sides is tn, The final 
multiply-subtract count for AT! is n? if we actually want the inverse: 


n2 


3 3 
ForA7! = (L and U) + = (forward) + n( > ) (back substitutions) = >. (1) 


Orthogonalization (A to Q): The key difference from elimination is that each multiplier 
is decided by a dot product. That takes n operations, where elimination just divides by 
the pivot. Then there are n “multiply-subtract” operations to remove from column x its 
projection along column j < k (see Section 4.4). The combined cost is 2” where for 
elimination it is n. This factor 2 is the price of orthogonality. We are changing a dot 
product to zero where elimination changes an entry to zero. 


Caution To judge a numerical algorithm, it is not enough to count the operations. Beyond 
“flop counting” is a study of stability (Householder wins) and the flow of data. 


Reordering Sparse Matrices 


In discussing band matrices, we assumed a constant width w. The rows were in an optimal 
order. But for most sparse matrices in real computations, the width of the band is not 
constant and there are many zeros inside the band. Those zeros can fill in as elimination 
proceeds—they are lost. We need to renumber the equations to reduce fill-in, and thereby 
speed up elimination. 

Generally speaking, we want to move zeros to early rows and columns. Later rows 
and columns are shorter anyway. The “approximate minimum degree” algorithm in sparse 
MATLAB is greedy—it chooses the row to eliminate without counting all the consequences. 
We may reach a nearly full matrix near the end, but the total operation count to reach LU 
is still much smaller. To renumber for an absolute minimum of nonzeros in L and U is an 
NP-hard problem, much too expensive, and amd is a good compromise. 

We only need the positions of the nonzeros, not their exact values. Think of the n 
rows as n nodes in a graph. Node i is connected to node j if aj; # 0. Watch to see how 
elimination can create a new edge from i to k. This means that a zero is filled in, which we 
are trying to avoid: 


When axy; is eliminated, a multiple of the pivot row j = 1 is subtracted from row k =3. 


If aj; was nonzero in row j , then ay; becomes nonzero in the new row k. A new edge. 


2 
1 I 1 1 1 1 
—2 1 0 | —>]| 0 3 2 — 1 
—2 0 2 02 4 3 
az = 0 az, = 2 azz, = 0 before a3, Æ 0 after 
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In this example, the 1’s change the 0’s into 2’s. Those entries fill in. 

The graph shows each step—look at the eliminationmovie on math.mit.edu/18086. 
The command nnz(L) counts the nonzero multipliers in the lower triangular L, find (L) 
will list them, and spy(L) shows them all. 

The matrix in the movie is the 2D version of our —1,2,—1 matrix. Instead of second 
differences along a line, the matrix has x and y differences on a plane grid. Each point is 
connected to its four nearest neighbors. But it is impossible to number all the points so that 
neighbors stay together. If we number by rows of the grid, there is a long wait to come 
around to the gridpoint above. 

The goal of colamd and symamd is a better ordering (permutation P) that reduces 
fill-in for PA and PAP'—by choosing the pivot with the fewest nonzeros below it. 


Fast Orthogonalization 


There are three ways to reach the important factorization A = QR. Gram-Schmidt works 
to find the orthonormal vectors in Q. Then R is upper triangular because of the order of 
Gram-Schmidt steps. Now we look at better methods (Householder and Givens), which 
use a product of specially simple Q’s that we know are orthogonal. 

Elimination gives A = LU, orthogonalization gives A = QR. We don’t want a 
triangular L, we want an orthogonal Q. L is a product of E’s, with 1’s on the diagonal and 
the multiplier £;; below. Q will be a product of orthogonal matrices. 

There are two simple orthogonal matrices to take the place of the E’s. The reflection 
matrices I — 2uu" are named after Householder. The plane rotation matrices are named 
after Givens. The simple matrix that rotates the xy plane by @ is Q21: 


cos@ —sin 0 
Givens rotation Ox, = | sinô cos@ 0 
0 0 1 


Use Q21, the way you used £21, to produce a zero in the (2, 1) position. That determines 
the angle 0. Bill Hager gives this example in Applied Numerical Linear Algebra: 


6 8 O)f 90° -153 114 150 —155 —110 
QnA = |i 8 6 Of] | 120° -79 -223| =] 0 75 —225 
O 0 1f[ 200 -40 395 200 —40 395 


The zero came from —.8(90) + .6(120). No need to find 6, what we needed was cos 0: 


cos @ = 20 and sin? = —120 (2) 
/902 + 1202 /902 + 1202 


Now we attack the (3,1) entry. The rotation will be in rows and columns 3 and 1. The 
numbers cos @ and sin @ are determined from 150 and 200, instead of 90 and 120. 


6 0 8 ][ 150 + - 250 -125 250 
Q31Q24 =| 0 1 0 O ->o ļ=| 0 75 -225 
S8 0 6 200 - .- © 100 325 
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One more step to R. The (3, 2) entry has to go. The numbers cos @ and sin 8 now come 
from 75 and 100. The rotation is now in rows and columns 2 and 3: 


1 0 0 250 —125 - 250 —125 250 
Qx20nQuA=|0 6 8 0 77 -|=| 0 1235 125 
0-8 6 0 100 - 0 OQ 375 


We have reached the upper triangular R. What is Q? Move the plane rotations Q;; to the 
other side to find A = QR—just as you moved the elimination matrices E;; to the other 
side to find A= LU: 


Q32031Q214 =R means A= (Q3 Q3 Q32)R = QR. (3) 


The inverse of each Qj; is gi (rotation through —0). The inverse of E;; was not an 
orthogonal matrix! L U and QR are similar but not the same. 

Householder reflections are faster because each one clears out a whole column below 
the diagonal. Watch how the first column a, of A becomes column r, of R: 


ee ee PASEN, 


The length was not changed, and u; is in the direction of a; — r1. We have n — 1 entries 
in the unit vector u; to get n — 1 zeros in rı. (Rotations had one angle @ to get one zero.) 
When we reach column k, n — k available choices in the unit vector ug lead to n — k zeros 
in rg. We just store the u’s and r’s to know Q and R: 


Inverse of H; is H; (Hn-1... H1) A = R means A= (Ay... Hn-1)R = OR. (5) 


This is how LAPACK improves on Gram-Schmidt. Q is exactly orthogonal. 

Section 9.3 explains how A = QR is used in the other big computation of linear 
algebra—the eigenvalue problem. The factors QR are reversed to give A; = RQ which 
is Q~'AQ. Since A, is similar to A, the eigenvalues are unchanged. Then A; is factored 
into Q, R,, and reversing the factors gives A2. Amazingly, the entries below the diagonal 
get smaller in A,, Az, A3,... and we can identify the eigenvalues. This is the “Q R method” 
for Ax = Ax, a big success of numerical linear algebra. 


Problem Set 9.1 


1 Find the two pivots with and without row exchange to maximize the pivot: 


ool 0 
A= |, 1000. 


With row exchanges to maximize pivots, why are no entries of L larger than 1? 
Find a 3 by 3 matrix A with all |a;;| < 1 and |é; | < 1 but third pivot = 4. 
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10 


11 


Compute the exact inverse of the Hilbert matrix A by elimination. Then compute 
A`! again by rounding all numbers to three figures: 


Ill-conditioned matrix A = hilb(3) = 


Wh Nie ee 
bl Wl Nie 
Wyle le ole 


For the same A compute b = Ax for x = (1,1,1) and x = (0, 6,-3.6). A small 
change Ab produces a large change Ax. 


Find the eigenvalues (by computer) of the 8 by 8 Hilbert matrix a;; = 1/@ + j — 1). 
In the equation Ax = b with ||b|| = 1, how large can ||x|| be? If b has roundoff 
error less than 10716, how large an error can this cause in x? See Section 9.2. 


For back substitution with a band matrix (width w), show that the number of multi- 
plications to solve Ux = c¢ is approximately wn. 


If you know L and U and Q and R, is it faster to solve L Ux = b or ORx = b? 


Show that the number of multiplications to invert an upper triangular n by n matrix 
is about in}, Use back substitution on the columns of J, upward from L’s. 


Choosing the largest available pivot in each column (partial pivoting), factor each A 
into PA = LU: 


1 0 1 0 i 
A= È | and A=|2 2 0 
02 0 


Put 1’s on the three central diagonals of a 4 by 4 tridiagonal matrix. Find the cofac- 
tors of the six zero entries. Those entries are nonzero in AT}. 


(Suggested by C. Van Loan.) Find the L U factorization and solve by elimination 
when ¢ = 1077, 1076, 107°, 1071”, 107": 


e ljjx| jl+e 
1 lf[x]f | 2 F 
The true x is (1,1). Make a table to show the error for each e. Exchange the two 


equations and solve again—the errors should almost disappear. 


(a) Choose sin @ and cos @ to triangularize A, and find R: 


; , _|cosð —sinĝð | |1 -ij _ Į» *] _ 
Givens rotation 0na =| 5 ond E s= ‘|= R. 


(b) Choose sin 6 and cos 8 to make OAQ™! triangular. What are the eigenvalues? 
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When A is multiplied by a plane rotation Q;;, which n* entries of A are changed? 


When Q;; A is multiplied on the right by Q;;' , which entries are changed now? 
How many multiplications and how many additions are used to compute Q;; A? 
Careful organization of the whole sequence of rotations gives 2n3 multiplications 


and Zn? additions—the same as for QR by reflectors and twice as many as for L U. 
Challenge Problems 


(Turning a robot hand) The robot produces any 3 by 3 rotation A from plane rota- 
tions around the x, y,z axes. Then O32031 0214 = R, where A is orthogonal so 
R is 7! The three robot turns are in A = Q34 Q3} Q34. The three angles are “Euler 
angles” and det Q = | to avoid reflection. Start by choosing cos 0 and sin @ so that 


cos? —sinO 0 Į- 2 2 
Q214 = | sinf cos@ O|]=| 2 —1 2] is zero in the (2, 1) position. 
0 o i/f]2 2-1 


Create the 10 by 10 second difference matrix K = toeplitz((2 — 1 zeros(1,8)]). 
Permute rows and columns randomly by KK = K(randperm(10), randperm(10)). 
Factor by [L, U] = lu(K) and [LL, UU] = lu(K K), and count nonzeros by nnz(L) 
and nnz(LL). In this case L is in perfect tridiagonal order, but not LL. 


Another ordering for this matrix K colors the meshpoints alternately red and black. 
This permutation P changes the normal 1,..., 10 to 1,3,5, 7, 9,2, 4, 6, 8, 10: 


2I D 
D™ 21 


So many interesting experiments are possible. If you send good ideas they can 
go on the linear algebra website math.mit.edu/linearalgebra. I also recommend 
learning the command B = sparse(A), after which find(B) will list the nonzero 
entries and £u(B) will factor B using that sparse format for L and U. Only the 
nonzeros are computed, where ordinary (dense) MATLAB computes all the zeros too. 


Red-black ordering PKP' = l | . Find the matrix D. 


Jeff Stuart has created a student activity that brilliantly demonstrates ill-conditioning: 


1 1.0001 | |x] _ |3.0001 +e With errors x = 2 — 10000(e — E) 
1 1.0000| | y|  |3.0000+ E| eand E y =1 + 10000 (e — E) 


The algebra shows how errors e and E are amplified by 10000 unless e = E. 

As always, the solution of a 2 by 2 system is the meeting point of two lines. 
The neat idea is to replace mathematical lines by long sticks held by students. 
The sticks for these two equations are almost parallel, and A is almost singular. 
Perpendicular sticks come from well-conditioned equations. 

In Stuart’s Shake a Stick activity, the students plot where the sticks cross 
(after multiple shakes). See www.plu.edu/~stuartjl for the wild movements of that 
crossing point (x, y), when the sticks are nearly parallel. 
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9.2 Norms and Condition Numbers 


How do we measure the size of a matrix? For a vector, the length is ||x||. For a matrix, 
the norm is ||A||. This word “norm” is sometimes used for vectors, instead of length. It 
is always used for matrices, and there are many ways to measure ||A]|. We look at the 
requirements on all “matrix norms” and then choose one. 

Frobenius squared all the |a;; |? and added; his norm || A||r is the square root. This treats 
A like a long vector with n? components: sometimes useful, but not the choice here. 

I prefer to start with a vector norm. The triangle inequality says that ||x + y || is not 
greater than ||x|j + ||y||. The length of 2x or —2x is doubled to 2||x||. The same rules 
will apply to matrix norms: 

“A+ BI <A +B] and > fell = [cl NAN (1) 


The second requirements for a matrix norm are new, because matrices multiply. The 
norm || A|| controls the growth from x to Ax, and from B to AB: 


Growth factor JAI] [Axl < IAI lx; and AB] < AIIB © 


This leads to a natural way to define || A||, the norm of a matrix: 


_ [Axl 
= max ae 
x#0 lx _ 


|| Ax || /||x || is never larger than || A|| (its maximum). This says that || Ax|| < || Al] Ilx ||- 


Example 1 If A is the identity matrix J, the ratios are ||x || /||x||. Therefore ||7|| = 1. If 
A is an orthogonal matrix Q, lengths are again preserved: ||Qx|| = ||x||. The ratios still 
give ||Q || = 1. An orthogonal Q is good to compute with: errors don’t grow. 


Example 2. The norm of a diagonal matrix is its largest entry (using absolute values): 


A= f 4 has norm ||A|| = 3. The eigenvector x = H has Ax = 3x. 


The eigenvalue is 3. For this A (but not all A), the largest eigenvalue equals the norm. 


For a positive definite symmetric matrix the norm is || A|| = Amax(A). 


Choose x to be the eigenvector with maximum eigenvalue. Then || Ax||/||x | equals Amax- 
The point is that no other x can make the ratio larger. The matrix is A = QAQ", and the 
orthogonal matrices Q and QT leave lengths unchanged. So the ratio to maximize is really 
| Ax |[/||x |]. The norm is the largest eigenvalue in the diagonal A. 


Symmetric matrices Suppose A is symmetric but not positive definite. A = QAQ™ is 
still true. Then the norm is the largest of |A,|, |Ao|, ..., [An]. We take absolute values, 
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because the norm is only concerned with length. For an eigenvector || Ax|| = ||Ax|| = |A| 
times ||x ||. The x that gives the maximum ratio is the eigenvector for the maximum |À]. 


Unsymmetric matrices If A is not symmetric, its eigenvalues may not measure its true 
size. The norm can be larger than any eigenvalue. A very unsymmetric example has 
A; = Az = 0 but its norm is not zero: 
0 2 || Ax| 

All >A A= has norm A|| = max = 

A> den 4=[5 5| JAI = max 5 
The vector x = (0, 1) gives Ax = (2,0). The ratio of lengths is 2/1. This is the maximum 
ratio || A||, even though x is not an eigenvector. 


It is the symmetric matrix ATA, not the unsymmetric A, that has eigenvector 
x = (0, 1). The norm is really decided by the largest eigenvalue of A’ A: 


The unsymmetric example with Amax(A) = 0 has Amax(A™A) = 4: 


A= lo ol leads to ATA = [o A with Amax = 4. So the norm is || A|| = v4. 
Forany A Choose x to be the eigenvector of ATA with largest eigenvalue À max. The ratio 
in equation (4) is xTATAx = xT(Ama)x divided by xTx. This is A max- 

No x can give a larger ratio. The symmetric matrix ATA has eigenvalues À1,..., Àn 
and orthonormal eigenvectors q4, 42,» - 4p. Every x is a combination of those vectors. 
Try this combination in the ratio and remember that qiq j=Q: 


xTATAx (Cigi +- + enn) (Crag) +-+ + Cnàngn) _ TAL ++ cAn 


xTx (C141 tees + nd) (C144 +++ Canan) c? +e +2 
The maximum ratio Ajax is when all c’s are zero, except the one that multiplies Amax. 


Note 1 The ratio in equation (4) is the Rayleigh quotient for the symmetric matrix ATA. 
Its maximum is the largest eigenvalue Àmax( ATA). The minimum ratio is Amin(A? A). 
If you substitute any vector x into the Rayleigh quotient x? A’Ax/x™x, you are guar- 
anteed to get a number between Amin(A?A) and Amax(ATA). 


Note 2 The norm ||A|| equals the largest singular value Omax of A. The singular values 
O1,- . „Op are the square roots of the positive eigenvalues of ATA. So certainly 
Omax = (Amax)#/*. Since U and V are orthogonal in A = UV", the norm is ||A || = dmax- 
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The Condition Number of A 


Section 9.1 showed that roundoff error can be serious. Some systems are sensitive, others 
are not so sensitive. The sensitivity to error is measured by the condition number. This 
is the first chapter in the book which intentionally introduces errors. We want to estimate 
how much they change x. 

The original equation is Ax = b. Suppose the right side is changed to b + Ab 
because of roundoff or measurement error. The solution is then changed to x + Ax. Our 
goal is to estimate the change Ax in the solution from the change Ab in the equation. 
Subtraction gives the error equation A(Ax) = Ab: 


Subtract Ax = b from A(x + Ax) = 6+ Ab to find AG 


(5) 


The error is Ax = ATAD. It is large when A`! is large (then A is nearly singular). The 
error Ax is especially large when Ab points in the worst direction—which is amplified 
most by A~!. The worst error has || Ax|| = ATF I | Ad. 

This error bound ||A~!|| has one serious drawback. If we multiply A by 1000, then 
A`! is divided by 1000. The matrix looks a thousand times better. But a simple rescaling 
cannot change the reality of the problem. It is true that Ax will be divided by 1000, but so 
will the exact solution x = AT1b. The relative error || Ax||/||x|| will stay the same. It is 
this relative change in x that should be compared to the relative change in b. 

Comparing relative errors will now lead to the “condition number” c = || All AT}. 
Multiplying A by 1000 does not change this number, because AT! is divided by 1000 and 
the condition number c stays the same. It measures the sensitivity of Ax = b. 


WARM ABT 
So d We 


Co Arl AA 


Proof The original equation is b = Ax. The error equation (5) is Ax = AIAD. 
Apply the key property || Ax|| < || A||||x |] of matrix norms: 

(ll < {All fll and [Axl] < AIAD. 
Multiply the left sides to get ||b|| || Ax||, and multiply the right sides to get c||x|| | Ad||. 


Divide both sides by ||5|| ||x ||. The left side is now the relative error ||Ax||/||x||. The 
right side is now the upper bound in equation (6). 
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The same condition number c = ||A]| || A~!|| appears when the error is in the matrix. 
We have AA instead of Ab in the error equation: 


Subtract Ax = b from (A + AA)(x + Ax) = b to find A(Ax) = —(AA)(x + Ax). 


Multiply the last equation by A7! and take norms to reach equation (7): 


_ laxi 
Axl] < ATAA] lx + Axli or i < JAAT == 


lx + Axl] 


IAAI 
All 


Conclusion Errors enter in two ways. They begin with an error AA or Ab—a wrong 
matrix or a wrong b. This problem error is amplified (a lot or a little) into the solution error 
Ax. That error is bounded, relative to x itself, by the condition number c. 

The error Ab depends on computer roundoff and on the original measurements of b. 
The error AA also depends on the elimination steps. Small pivots tend to produce large 
errors in L and U. Then L + AL times U + AU equals A + AA. When AA or the 
condition number is very large, the error Ax can be unacceptable. 


Example 3. When A is symmetric, c = |[A|| || A~+|| comes from the eigenvalues: 


1 
A= 6 0 has norm 6. Al=|6 0 has norm żŁ. 
0 2 5 2 


This A is symmetric positive definite. Its norm is Ama, = 6. The norm of A`! 
I/Àmin = L, Multiplying norms gives the condition number || A|| || A7!|| = Amax/Amin: 


À 6 
Condition number for positive definite A c= x. =5> 3. 
min 


Example 4 Keep the same A, with eigenvalues 6 and 2. To make x small, choose b along 
the first eigenvector (1,0). To make Ax large, choose Ab along the second eigenvector 
(0,1). Then x = b and Ax = ŁAb. The ratio ||Ax||/||x|| is exactly c = 3 times the 
ratio || Ad||/|[5|]. 

This shows that the worst error allowed by the condition number || All || A74|| can 
actually happen. Here is a useful rule of thumb, experimentally verified for Gaussian 
elimination: The computer can lose log c decimal places to roundoff error. 


Problem Set 9.2 
1 Find the norms ||A|| = Amax and condition numbers c = Amax/Amin Of these positive 
definite matrices: 


ed Ege 
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2 Find the norms and condition numbers from the square roots of Amax(A'A) and 
Àmin( ATA). Without positive definiteness in A, we go to ATA ! 


—2 0 1 1 1 1 
0 2 0 0 —1 1l 
3 Explain these two inequalities from the definitions (3) of || A|| and || Bl: 


|ABx|| < Al Bx < Al Bi xl. 


From the ratio of || A Bx|| to {|x ||, deduce that ||AB|| < || All || B||. This is the key to 
using matrix norms. The norm of A” is never larger than || A||”. 


4 Use || AAT} || < [IAI] || AT} || to prove that the condition number is at least 1. 


5 Why is Z the only symmetric positive definite matrix that has Amax = Amin = 1? 
Then the only other matrices with |] Aj] = 1 and ||A7'|| = 1 must have ATA = J. 
Those are matrices: perfectly conditioned. 


6 Orthogonal matrices have norm ||Q|| = 1. If A = OR show that | A|| < || RI] and 
also |[R|| < |JAl]. Then |All = HOI || Rl]. Find an example of A = LU with 
All < IZNU |. 


7 (a) Which famous inequality gives ||(A + B)x|| < ||Ax|| + || Bx || for every x? 
(b) Why does the definition (3) of matrix norms lead to | A + BI} < || Al] + || Bll? 
8 Show that if A is any eigenvalue of A, then |A| < ||A|l. Start from Ax = Ax. 


9 The “spectral radius” p(A) = |Amax| is the largest absolute value of the eigenvalues. 
Show with 2 by 2 examples that 0(A + B) < p(A) + p(B) and p(AB) < p(A)p(B) 
can both be false. The spectral radius is not acceptable as a norm. 


10 (a) Explain why A and A™! have the same condition number. 
(b) Explain why A and AT have the same norm, based on À (ATA) and A(AAT). 


11 Estimate the condition number of the ill-conditioned matrix A = [}, gho |. 


12 Why is the determinant of A no good as a norm? Why is it no good as a condition 
number? 


13 (Suggested by C. Moler and C. Van Loan.) Compute b — Ay and b — Az when 


b= .217 Az .780 .563 _ | 341 z= .999 
T | .254 ~ 1.913.659] 77 |-087| *~|-1.0 | 
Is y closer than z to solving Ax = b? Answer in two ways: Compare the residual 


b — Ay to b — Az. Then compare y and z to the true x = (i, —1). Both answers can 
be right. Sometimes we want a small residual, sometimes a small Ax. 


14 (a) Compute the determinant of A in Problem 13. Compute A™!. 
(b) If possible compute || A|| and || A7!|| and show that c > 10°. 
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Problems 15-19 are about vector norms other than the usual ||x || = /x - x. 


15 The“! norm” and the “£™ norm” of x = (X1,...,Xp,) are 


æla = bal te [xn “[l¥lloo = max |x]. > 
oo qo l<i<n z 


Compute the norms ||x|| and Įlxlļ|ı and ||x|ļoo of these two vectors in RŽ: 


x = (1,1,1,1,1) x = (.1,.7,.3,.4,.5). 


16 Prove that ||x|loo < ||x|| < ||x||1. Show from the Schwarz inequality that the ratios 
[x || /lx loo and |[x|]1/||x|| are never larger than yn. Which vector (x1,. . ., Xn) 
gives ratios equal to ./n? 


17 All vector norms must satisfy the triangle inequality. Prove that 
lx + yllo < llxllo + lylo and lx +yli < xli + ly lh. 


18 Vector norms must also satisfy ||cx|| = |c| ||x |]. The norm must be positive except 
when x = 0. Which of these are norms for vectors (x), x2) in R*? 


lx 4 = [xı] + 2|x2l [|x [la = min (|x1|, |x2) 
Ix llo = xl] + lllo |x ||> = ||Ax|| (this answer depends on A). 
Challenge Problems 


19 Show that x? y < ||x|l1 ll ylloo by choosing components y; = +1 to make x’ y as 
large as possible. 


20 The eigenvalues of the —1, 2, —1 difference matrix K are A = 2—2cos (jx/n+l). 
Estimate Amin and Amax and c = cond(K) = Amax/Amin as n increases: e ~ Cn? 
with what constant C? 


Test this estimate with eig(K) and cond(K) for n = 10, 100, 1000. 
21 For unsymmetric matrices, the spectral radius p = max{]A;| is not a norm 


(Problem 9). But still || A”|| grows or decays like p” for large n. Compare those 
numbers for A = [1 1; 0 1.1] using the command norm. 


In particular A” — 0 when p < 1. This is the key to Section 9.3 with A = STIT. 
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9.3 Iterative Methods and Preconditioners 


Up to now, our approach to Ax = b has been direct. We accepted A as it came. We 
attacked it by elimination with row exchanges. This section is about iterative methods, 
which replace A by a simpler matrix S. The difference T = S — A is moved over to the 
right side of the equation. The problem becomes easier to solve, with S instead of A. But 
there is a price—the simpler system has to be solved over and over. 

An iterative method is easy to invent. Just split A (carefully) into $ — T. 


Rewrite Ax = b Sx =Tx +b. (1) 


The novelty is to solve (1) iteratively. Each guess x, leads to the next x41: 


Sxe = Tee +b. (2) 


Start with any xo. Then solve Sx; = Txo + b. Continue to the second iteration Sx2 = 
Txı +b. A hundred iterations are very common—often more. Stop when (and if!) the new 
vector xg+1 is sufficiently close to x,—or when the residual rg = b — Ax, is near zero. 
We choose the stopping test. Our hope is to get near the true solution, more quickly than by 
elimination. When the sequence x, converges, its limit x = Xo does solve equation (1). 
The proof is to let k — oo in equation (2). 

The two goals of the splitting A = S — T are speed per step and fast convergence. 
The speed of each step depends on S and the speed of convergence depends on S~!T: 


1 Equation (2) should be easy to solve for x;,4 . The “preconditioner” S could be the 
diagonal or triangular part of A. A fast way uses S = LoUo, where those factors 
have many zeros compared to the exact A = LU. This is “incomplete LU”. 


2 The difference x — x, (this is the error e;) should go quickly to zero. Subtracting 
equation (2) from (1) cancels b, and it leaves the equation for the error ex: 


Error equation : Sek+ı = Tep whic cans: ez+ı =S Teg. (3) 


At every step the error is multiplied by STIT. If ST1T is small, its powers go quickly to 
zero. But what is “small”? 

The extreme splitting is S = A and T = 0. Then the first step of the iteration is the 
original Ax = b. Convergence is perfect and STIT is zero. But the cost of that step is 
what we wanted to avoid. The choice of S is a battle between speed per step (a simple S) 
and fast convergence (S close to A). Here are some popular choices: 


J S = diagonal part of A (the iteration is called Jacobi’s method) 
GS S = lower triangular part of A including the diagonal (Gauss-Seidel method) 
SOR S = combination of Jacobi and Gauss-Seidel (successive overrelaxation) 


ILU S = approximate L times approximate U (incomplete L U method). 
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Our first question is pure linear algebra: When do the xg’s converge to x? The answer 
uncovers the number |A|max that controls convergence. In examples of J and GS and SOR, 
we will compute this “spectral radius” |A|max. It is the largest eigenvalue of the iteration 
matrix B = ST. 


The Spectral Radius p(B) Controls Convergence 


Equation (3) is ex4; = S~!Te,. Every iteration step multiplies the error by the same 
matrix B = STIT. The error after k steps is ep = B*eo. The error approaches zero 
if the powers of B = STIT approach zero. It is beautiful to see how the eigenvalues of 
B—the largest eigenvalue in particular—control the matrix powers B*. 


ue of B has |A] < 1. 
3: p = max |A(B)|. . 


The test for convergence is |A|max < 1. Real eigenvalues must lie between —1 and 1. 
Complex eigenvalues A = a + ib must have |A|? = a? + b? < 1. (Chapter 10 will 
discuss complex numbers.) The spectral radius “rho” is the largest distance from 0 to the 
eigenvalues A1,...,Ay of B = STIT. This is p = |A|max- 

To see why |Almax < 1 is necessary, suppose the starting error eg happens to be an 
eigenvector of B. After one step the error is Beg = Aeg. After k steps the error is 
Bee = Akeo. If we start with an eigenvector, we continue with that eigenvector—and it 
grows or decays with the powers A*. This factor A* goes to zero when |A| < 1. Since this 
condition is required of every eigenvalue, we need p = |Almax < 1. 

To see why |Almax < 1 is sufficient for the error to approach zero, suppose €o is a 
combination of eigenvectors: 


€o =C1X1 +--+ +CpxX, leadsto e, = c1(Ai)* xy teet Cn (Àn) xn. (4) 


This is the point of eigenvectors! They grow independently, each one controlled by its 
eigenvalue. When we multiply by B, the eigenvector x; is multiplied by A;. If all |A;| < 1 
then equation (4) ensures that eg goes to zero. 


Exampie1 B =[-6:3] has Amax = 1.1 B’ =[-6 1-1] has Amax = -6 
B? is 1.1 times B. Then B? is (1.1)? times B. The powers of B will blow up. 


Contrast with the powers of B’. The matrix (B’)* has (.6)* and (.5)* on its diagonal. 
The off-diagonal entries also involve p* = (.6)*, which sets the speed of convergence. 


Note There is a technical difficulty when B does not have n independent eigenvectors. (To 
produce this effect in B’, change .5 to .6.) The starting error €o may not be a combination 
of eigenvectors—there are too few for a basis. Then diagonalization is impossible and 
equation (4) is not correct. We turn to the Jordan form when eigenvectors are missing: 


Jordan form J B=MJM" and Be =mMsku™. (5) 
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Section 6.6 shows how J and J* are made of “blocks” with one repeated eigenvalue: 


k 
. à l Ak kak} 
The powers of a 2 by 2 block in J are É i = | 0 yk |. 


If |A| < 1 then these powers approach zero. The extra factor k from a double eigenvalue is 
overwhelmed by the decreasing factor A*—!. This applies to all Jordan blocks. A block of 
size S + 1 has k$ A*-S in J*, which also approaches zero when |À] < 1. 


Diagonalizable or not: Convergence B* — 0 and its speed depend on p = |A\max < |. 


Jacobi versus Gauss-Seidel 


We now solve a specific 2 by 2 problem. Watch for that number |À |max- 


2u- v= 4 . u 2 
Ax =b u+ 2w = -2 has the solution H = | | . (6) 
The first splitting is Jacobi’s method. Keep the diagonal of A on the left side (this is S). 
Move the off-diagonal part of A to the right side (this is T). Then iterate: 


Jaçobi iteration: -S *¥k+l = Preto yey =U 2. 


Start from uo = vo = 0. The first step finds u; = 2 and vı = —1. Keep going: 


ol La] Po Leva} [6] [ie] vee [6] 


This shows convergence. At steps 1, 3, 5 the second component is —1, —1/4, —1/16. The 
error is multiplied by 4 every two steps. The components 0, 3/2, 15/8 have errors 2, at 
Those also drop by 4 in each two steps. The error equation is Sex4, = Tex: 


1 


. 2 0 0 1 0 
Error equation É | €Ck+1 = Í ol €k O ĉk4i = |$ 2 lex. (7) 
That last matrix S~!T has eigenvalues 5 and —. So its spectral radius is p(B) = 3 


0 3 


F 1 o 47 _fł4 0 
s=s*r=|; d has Alma = 3 and E d [3 i 


Two steps multiply the error by 1 exactly, in this special example. The important message 
is this: Jacobi’s method works well when the main diagonal of A is large compared to the 
off-diagonal part. The diagonal part is S, the rest is —T. We want the diagonal to dominate 
and S~'T to be small. 

The eigenvalue A = 4 is unusually small. Ten iterations reduce the error by 
219 — 1024. More typical and more expensive is |A|max = -99 or .999. 
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The Gauss-Seidel method keeps the whole lower triangular part of A as S: 


Gauss-Seidel Zuk =% +4 or a 2 
Ue + 2vk+1 = —2 Vk+1 = luky -1. 
Notice the change. The new ug+1 from the first equation is used immediately in the second 
equation. With Jacobi, we saved the old ug until the whole step was complete. With Gauss- 
Seidel, the new values enter right away and the old ug is destroyed. This cuts the storage in 
half. It also speeds up the iteration (usually). And it costs no more than the Jacobi method. 
Starting from (0, 0), the exact answer (2,0) is reached in one step. That is an accident 
I did not expect. Test the iteration from another start uo = 0 and vg = —1: 


0 3/2 15/8 63/32 2 
[Ai] [ava] [arts] [Sea] mma [5] 
The errors in the first component are 2, 1/2, 1/8, 1/32. The errors in the second component 
are —1, -1/4, —1/16, —1/32. We divide by 4 in one step not two steps. Gauss-Seidel is 
twice as fast as Jacobi. We have pgs = (py). 
This double speed is true for every positive definite tridiagonal matrix. Anything is 


possible when A is strongly nonsymmetric—Jacobi is sometimes better, and both methods 
might fail. Our example is small and A is positive definite tridiagonal: 


_[ 2 0 _fo 1 1, L 
S= È | and T = lo | and STT = o 
The Gauss-Seidel eigenvalues are 0 and H. Compare with 5 and -4 for Jacobi. 


Aj Nfe 


1 


With a small push we can explain the successive overrelaxation method (SOR). The new 
idea is to introduce a parameter w (omega) into the iteration. Then choose this number w 
to make the spectral radius of S~!T' as small as possible. 

Rewrite Ax = b as wAx = wb. The matrix S in SOR has the diagonal of the 
original A, but below the diagonal we use wA. On the right side T is S — wA: 


Quays = (2—2w)uz + wove + 4 
SOR —wUuk+1 + 2Ug41 = (2 — 2w)vk — 20. (9) 


This looks more complicated to us, but the computer goes as fast as ever. Each new uz44 
from the first equation is used immediately to find vz 4, in the second equation. This is like 
Gauss-Seidel, with an adjustable number œw. The key matrix is STIT: 


SOR iteration matrix SOT = | i? ze a|: (10) 


The determinant is (1 — œ)?. At the best w, both eigenvalues turn out to equal 7 — 4/3, 
which is close to ( 4). Therefore SOR is twice as fast as Gauss-Seidel in this example. In 
other examples SOR can converge ten or a hundred times as fast. 
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I will put on record the most valuable test matrix of order n. It is our favorite —1, 2, 
—1 tridiagonal matrix K. The diagonal is 27. Below and above are —1’s. Our example had 
n = 2, which leads to cos 5 = 4 as the Jacobi eigenvalue found above. Notice especially 


that this eigenvalue is squared for Gauss-Seidel: 


these eigenvalues of B: 


Gauss-Seidel (S = —1,2,0 matrix): © ST has |Almax = (cos ) i. 


3 E 
iE 


Let me be clear: For the —1, 2, —1 matrix you should not use any of these iterations! 
Elimination is very fast (exact L U). Iterations are intended for large sparse matrices— 
when a high percentage of the entries are zero. The not good zeros are inside the band, 
which is wide. They become nonzero in the exact L and U, which is why elimination 
becomes expensive. 

We mention one more splitting. The idea of “incomplete L U” is to set the small 
nonzeros in L and U back to zero. This leaves triangular matrices Lo and Up which are 
again sparse. The splitting has S = LoUo on the left side. Each step is quick: 


SOR (with the beat o): | STT has |Almax = (cos nae i) /( + sin 


Incomplete LU LoUoxk+1 = (LoUo — A)xk + b. 


On the right side we do sparse matrix-vector multiplications. Don’t multiply Lo times Uo, 
those are matrices. Multiply x, by Up and then multiply that vector by Lo. On the left side 
we do forward and back substitutions. If LoọUo is close to A, then |A|max is small. A few 
iterations will give a close answer. 


Multigrid and Conjugate Gradients 


I cannot leave the impression that Jacobi and Gauss-Seidel are great methods. Generally 
the “low-frequency” part of the error decays very slowly, and many iterations are needed. 
Here are two ideas that bring tremendous improvement. Multigrid can solve problems of 
size n in O(n) steps. With a good preconditioner, conjugate gradients becomes one of the 
most popular and powerful algorithms in numerical linear algebra. 


Multigrid Solve smaller problems (often coming from coarser grids and doubled step- 
sizes Ax and Ay). Each iteration will be cheaper and convergence will be faster. Then 
interpolate between the values computed on the coarse grid to get a quick and close head- 
start on the full-size problem. Multigrid might go 4 levels down and back. 
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Conjugate gradients An ordinary iteration like x+] = x, — Axx + b involves mul- 
tiplication by A at each step. If A is sparse, this is not too expensive: Ax, is what we 
are willing to do. It adds one more basis vector to the growing “Krylov spaces” that con- 
tain our approximations. But x,4; is not the best combination of xo, Axo,..., Ak xo. 
The ordinary iterations are simple but far from optimal. 

The conjugate gradient method chooses the best combination x, at every step. The 
extra cost (beyond one multiplication by A) is not great. We will give the CG iteration, 
emphasizing that this method was created for a symmetric positive definite matrix. When 
A is not symmetric, one good choice is GMRES. When A = A? is not positive definite, 
there is MINRES. A world of high-powered iterative methods has been created around the 
idea of making optimal choices of each successive xg. 


My textbook Computational Science and Engineering describes multigrid and CG in 
much more detail. Among books on numerical linear algebra, Trefethen-Bau is deservedly 
popular (others are terrific too). Golub-Van Loan is a level up. 

The Problem Set reproduces the five steps in each conjugate gradient cycle from X k—1 
to xg. We compute that new approximation xz, the new residual r = b — Ax x, and the 
new search direction d g to look for the next xg+1- 

I wrote those steps for the original matrix A. But a preconditioner S can make con- 
vergence much faster. Our original equation is Ax = b. The preconditioned equation is 
S-1Ax = S—1b. Small changes in the code give the preconditioned conjugate gradient 
method—the leading iterative method to solve positive definite systems. 

The biggest competition is direct elimination, with the equations reordered to take max- 
imum advantage of many zeros in A. It is not easy to outperform Gauss. 


Iterative Methods for Eigenvalues 


We move from Ax = b to Ax = Ax. Iterations are an option for linear equations. They 
are a necessity for eigenvalue problems. The eigenvalues of an n by n matrix are the roots 
of an nth degree polynomial. The determinant of A — AJ starts with (—A)”. This book 
must not leave the impression that eigenvalues should be computed that way! Working 
from det(A — AJ) = 0 is a very poor approach—except when n is small. 

For n > 4 there is no formula to solve det(A — AJ) = 0. Worse than that, the A’s 
can be very unstable and sensitive. It is much better to work with A itself, gradually mak- 
ing it diagonal or triangular. (Then the eigenvalues appear on the diagonal.) Good computer 
codes are available in the LAPACK library—individual routines are free on 
www.netlib.org/lapack. This library combines the earlier LINPACK and EISPACK, with 
many improvements (to use matrix-matrix operations in the Level 3 BLAS), It is a collec- 
tion of Fortran 77 programs for linear algebra on high-performance computers. For your 
computer and mine, a high quality matrix package is all we need. For supercomputers with 
parallel processing, move to ScaLAPACK and block elimination. 


We will briefly discuss the power method and the OR method (chosen by LAPACK) 
for computing eigenvalues. It makes no sense to give full details of the codes. 
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1 Power methods and inverse power methods. Start with any vector uo. Multiply by 
A to find u. Multiply by A again to find u2. If uọ is a combination of the eigenvectors, 
then A multiplies each eigenvector x; by A;. After k steps we have (A;)*: 


ug = A¥ug = c1(Ai)Fx4 +++ + Cn lAn) Xn. (11) 


As the power method continues, the largest eigenvalue begins to dominate. The vectors 
ug point toward that dominant eigenvector. We saw this for Markov matrices in Chapter 8: 


9 3 as 75 
a= A has Amax = 1 with eigenvector 3l 


Start with uo and multiply at every step by A: 


l .9 .84 . . 75 
w= jop Sji p 42= 16 is approaching Ws = a5 


The speed of convergence depends on the ratio of the second largest eigenvalue A to the 
largest A,. We don’t want A, to be small, we want A2/A, to be small. Here Ao = .6 and 
A; = 1, giving good speed. For large matrices it often happens that |A2/A,| is very close 
to 1. Then the power method is too slow. 

Is there a way to find the smallest eigenvalue—which is often the most important in 
applications? Yes, by the inverse power method: Multiply ug by A`! instead of A. Since 
we never want to compute A7!, we actually solve Au, = uo. By saving the L U factors, 
the next step Auz = u4 is fast. Step k has Aup = uxz—: 


C1X1 Cn¥n 


Inverse power method ük = A Fug = 4 p Z, 
P ° T aX On 


(12) 


Now the smallest eigenvalue Amin is in control. When it is very small, the factor 1/ an is 
large. For high speed, we make À min even smaller by shifting the matrix to A —A* J. 

That shift doesn’t change the eigenvectors. (A* might come from the diagonal of A, 
even better is a Rayleigh quotient xTAx/xTx). If A* is close to Amin then (A — A*J)~} 
has the very large eigenvalue (Amin —A*)7!. Each shifted inverse power step multiplies the 
eigenvector by this big number, and that eigenvector quickly dominates. 


2 The QR Method This is a major achievement in numerical linear algebra. Fifty years 
ago, eigenvalue computations were slow and inaccurate. We didn’t even realize that solv- 
ing det(A — AJ) = 0 was a terrible method. Jacobi had suggested earlier that A should 
gradually be made triangular—then the eigenvalues appear automatically on the diagonal. 
He used 2 by 2 rotations to produce off-diagonal zeros. (Unfortunately the previous zeros 
can become nonzero again. But Jacobi’s method made a partial comeback with parallel 
computers.) At present the QR method is the leader in eigenvalue computations and we 
describe it briefly. 

The basic step is to factor A, whose eigenvalues we want, into QR. Remember from 
Gram-Schmidt (Section 4.4) that QO has orthonormal columns and R is triangular. For 
eigenvalues the key idea is: Reverse Q and R. The new matrix (same A’s) is Ay = RQ. 
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The eigenvalues are not changed in RQ because A = QR is similar to Ay = QT! AQ: 
A, = RỌ has the same A ORx =Ax gives RO(Q7!x) =A(Q7!x). (13) 


This process continues. Factor the new matrix A; into Q;R,. Then reverse the factors 
to R, Qy. This is the similar matrix A> and again no change in the eigenvalues. Amazingly, 
those eigenvalues begin to show up on the diagonal. Often the last entry of A4 holds an 
accurate eigenvalue. In that case we remove the last row and column and continue with a 
smaller matrix to find the next eigenvalue. 

Two extra ideas make this method a success. One is to shift the matrix by a multiple of 
1, before factoring into QR. Then RQ is shifted back: 


Factor Ay — ck I into Qk Ry. The next matrix is Ag, = RkOk + cgl. 


Ax+1 has the same eigenvalues as Ag, and the same as the original Ag = A. A good shift 
chooses c near an (unknown) eigenvalue. That eigenvalue appears more accurately on the 
diagonal of Ax41—which tells us a better c for the next step to Az4>. 

The other idea is to obtain off-diagonal zeros before the QR method starts. An elimi- 
nation step Æ will do it, or a Eivens rotation, but don’t forget ET! (to keep A): 


1 12 31/71 1 5 3 
EAE! = 1 14 5 1 = |1 9 5]|.SameA’s. 
167 0 4 2 


We must leave those nonzeros 1 and 4 along one subdiagonal. More E's could remove 
them, but E7! would fill them in again. This is a “Hessenberg matrix” (one nonzero 
subdiagonal). The zeros in the lower left corner will stay zero through the QR method. 
The operation count for each QR factorization drops from O(n?) to O(n). 

Golub and Van Loan give this example of one shifted OR step on a Hessenberg matrix. 
The shift is 77, taking 7 from all diagonal entries (then shifting back for Aj): 


12 ° 3 —54 1.69 0.835 
A= {4 5 6 leads to A, = 31 6.53 —6.656 
0 .001 7 0 00002 7.012 


Factoring A—7J into QR produced Ay = RQ +71. Notice the very small number .00002. 
The diagonal entry 7.012 is almost an exact eigenvalue of A4, and therefore of A. Another 
OR step on A, with shift by 7.0127 would give terrific accuracy. 


For large sparse matrices I would look to ARPACK. Problems 27-29 describe the 
Arnoldi iteration that orthogonalizes the basis—each step has only three terms when A 
is symmetric. The matrix becomes tridiagonal and still orthogonally similar to the original 
A: a wonderful start for computing eigenvalues. 
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Problem Set 9.3 


Problems 1-12 are about iterative methods for Ax = b. 


1 


10 


11 


Change Ax = b tox = (I — A)x + b. What are S and T for this splitting? What 
matrix STIT controls the convergence of xg+1 = (I — A)xz +b? 


If à is an eigenvalue of A, then is an eigenvalue of B = J — A. The real 
eigenvalues of B have absolute value less than 1 if the real eigenvalues of A lie 
between and 


Show why the iteration x44.) = (J — A)x, + b does not converge for A = [ _? 74]. 


Why is the norm of B* never larger than || B||*? Then ||B|| < 1 guarantees that the 
powers B¥ approach zero (convergence). No surprise since |A|max is below || B]|. 


If A is singular then all splittings A = S — T must fail. From Ax = 0 show that 
S—'Tx = x. So this matrix B = STIT has A = 1 and fails. 


Change the 2’s to 3’s and find the eigenvalues of S~!T for Jacobi’s method: 
. 3 0 0 1 
SxXxe4, =Tx~ +d is f s| Xk+ = f IE +b. 


Find the eigenvalues of S~!T for the Gauss-Seidel method applied to Problem 6: 


3 0 0 1 
[3 Jan [? tasa 


2 


“ax for Jacobi? 


Does |À |max for Gauss-Seidel equal |À | 


For any 2 by 2 matrix [2 b] show that |A|max equals |bc/ad| for Gauss-Seidel and 
|bc/ad|'/2 for Jacobi. We need ad ¥ 0 for the matrix S to be invertible. 


The best w produces two equal eigenvalues for STIT in the SOR method. Those 
eigenvalues are œw — 1 because the determinant is (w — 1)*. Set the trace in equa- 
tion (10) equal to (œ —‘1) + (w — 1) and find this optimal w. 


Write a computer code (MATLAB or other) for the Gauss-Seidel method. You can 
define S and T from A, or set up the iteration loop directly from the entries a;;. Test 
it on the —1, 2, —1 matrices A of order 10, 20, 50 with b = (1,0,...,0). 


The Gauss-Seidel iteration at component i uses earlier parts of x"°™’: 
l i—i n 
° new _ _old new „Old 
Gauss-Seidel Xp =x + — (b; — ) GijXjo — ) aij Xj ). 
li . 7 
j=l j=i 


If every xP° = xold how does this show that the solution x is correct? How does 
the formula change for Jacobi’s method? For SOR insert w outside the parentheses. 
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12 The SOR splitting matrix S is the same as for Gauss-Seidel except that the diagonal 
is divided by œw. Write a program for SOR on ann by n matrix. Apply it with w = 1, 
1.4, 1.8, 2.2 when A is the —1, 2, —1 matrix of order n = 10. 


13 Divide equation (11) by àk and explain why |A2/A,| controls the convergence of the 
power method. Construct a matrix A for which this method does not converge. 


14 The Markov matrix A = [-? 3] has 4 = 1 and .6, and the power method uz = A* uo 
converges to [-32]. Find the eigenvectors of A~!. What does the inverse power 


method ug = A~*ug converge to (after you multiply by 6*)? 


15 The tridiagonal matrix of size n — 1 with diagonals —1,2,—1 has eigenvalues 
À; = 2-—2cos(jx/n). Why are the smallest eigenvalues approximately (j/n)?? 
The inverse power method converges at the speed A, /A2 1/4. 


16 For A = [_?—}] apply the power method uķ+ı = Au, three times starting with 
io = [a |. What eigenvector is the power method converging to? 


17 In Problem 11 apply the inverse power method u,4; = Au, three times with the 
same uo. What eigenvector are the u,’s approaching? 


18 In the QR method for eigenvalues, show that the 2,1 entry drops from sin in 
A = QR to —sin? 0 in RO. (Compute R and RQ.) This “cubic convergence” 
makes the method a success: 


cos@ sinô cos@ —sin@ |] 1 ? 
4=| 25 0 |-or= |0 ollo Ji 


19 If Ais an orthogonal matrix, its QR factorization has Q = and R = 
Therefore RO = . These are among the rare examples when the QR method 
goes nowhere. 


20 = The shifted OR method factors A — c1 into OR. Show that the next matrix A, = 
RO + cI equals Q-'AQ. Therefore A, has the eigenvalues as A (but is 
closer to triangular). 

21 When A = A’, the “Lanczos method” finds a’s and b’s and orthonormal q’s so that 
Aq; = bj-19 j- +479 ; +079 ;+, (with gg = 0). Multiply by q7 to find a formula 
for aj. The equation says that AQ = QT where T is a tridiagonal matrix. 

22 The equation in Problem 21 develops from this loop with bọ = 1 and ro = any q4: 


4 jai =Fj/bj j = j+; aj = Qh Aqys r; = Aq jbj- jajj bi = Ir sll. 


Write a code and test it on the —1, 2, —1 matrix A. Q'O should be 7. 
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23 Suppose A is tridiagonal and symmetric in the QR method. From Ay = Q7'AQ 

show that A; is symmetric. Write A; = RART! to show that A, is also tridiagonal. 

(If the lower part of A, is proved tridiagonal then by symmetry the upper part is too.) 
Symmetric tridiagonal matrices are the best way to start in the OR method. 


Questions 24—26 are about quick ways to estimate the location of the eigenvalues. 


24 Ifthe sum of |a;;| along every row is less than 1, explain this proof that |A] < 1. 
Suppose Ax = Ax and |x; | is larger than the other components of x. Then | aj; x ;| 
is less than |x;|. That means |Ax;| < |x;| so JA] < 1. 


(Gershgorin circles) Every eigenvalue of Ais in one-or more of n circles. ‘Each 
circle is, centered at a diagonal entry a;; with radius r; = Djy layl- 


This follows from (A — aii)xi = Xjejaijxj. If |x| is larger than the other compo- 
nents of x, this sum is at most r; |x; |. Dividing by |x;| leaves |A — aj;| £ ri. 


25 What bound on |A|max does Problem 24 give for these matrices? What are the three 
Gershgorin circles that contain all the eigenvalues? Those circles show immediately 
that K is at least positive semidefinite (actually definite) and A has Amax = 1. 


3 5 2 2—1 0 
A=|3 4 3 K={-1l 2 -li 
4 1 5 0-1 2 


26 These matrices are diagonally dominant because each a;; > r; = absolute sum along 
the rest of row i. From the Gershgorin circles containing all A’s, show that diagonally 
dominant matrices are invertible. 


‘) 1 3 4 4 2 
A=|].3 1 5 A=|1 3 
4°65 1 2 2 


a — 


Problems 27-30 present two fundamental iterations. Each step involves Aq or Ad. 


The key point for large matrices is that matrix-vector multiplication is much faster 
than matrix-matrix multiplication. A crucial construction starts with a vector b. Re- 
peated multiplication will produce Ab, A7h,... but those vectors are far from orthogonal. 
The “Arnoldi iteration” creates an orthonormal basis g,,q2,... for the same space by the 
Gram-Schmidt idea: orthogonalize each new Aq, against the previous q,,.-.,Qy—,- The 
“Krylov space” spanned by b, Ab,..., A”—!b then has a much better basis q4, -.., 4p. 
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Here in pseudocode are two of the most important algorithms in numerical linear 
algebra: Arnoldi gives a good basis and CG gives a good approximation to x = A~'!b. 


Arnoldi Iteration Conjugate Gradient Iteration for Positive Definite A 
qı = 5/|5|| xo =0,r9 =b, do = ro 
forn = lto N —1 for n = 1 to N 
v = Aq, On = (ri _yrn-1)/(di_,Adn_1) step length x,_; to Xn 


for j = l ton Xn = Xn-1 + Qndn-1 approximate solution 
hjn = qiv = Fn-1 —A, Ad n-} new residual b — Axy, 
v=v—hjng; Bn = (rprn)/O sini) improvement this step 

An+isn = |v dn =Fn + Bndn-1 next search direction 

Gn+1 = V/hn+1n | % Notice: only 1 matrix-vector multiplication Aq and Ad 


For conjugate gradients, the residuals r, are orthogonal and the search directions are A- 
orthogonal: all d 1 Ad k = 0. The iteration solves Ax = b by minimizing the error eT Ae 
over all vectors in the Krylov subspace. It is a fantastic algorithm. 


27 For the diagonal matrix A = diag([1 2 3 4]) and the vector b = (1,1, 1,1), go 
through one Arnoldi step to find the orthonormal vectors q, and q3. 


28 Arnoldi’s method is finding Q so that AQ = QH (column by column): 


hi hin + fin 


h h - h 
AQ=| Aq, © Agy|=/41 aw] | GO) an | OY |= OH 


0 O + ANN 


H is a “Hessenberg matrix” with one nonzero subdiagonal. Here is the crucial fact 
when A is symmetric: The matrix H = Q7!AỌQ = Q'AO is symmetric and 
therefore tridiagonal. Explain that sentence. 


29 This tridiagonal H (when A is symmetric) gives the Lanczos iteration: 
Three terms only qj}; = (Aq; — hj jaj — hy—1,79j-1)/ itn) 


From H = Q7!AQ, why are the eigenvalues of H the same as the eigenvalues 
of A? For large matrices, the “Lanczos method” computes the leading eigenvalues 
by stopping at a smaller tridiagonal matrix Hg. The QR method in the text is applied 
to compute the eigenvalues of Hx. 


30 Apply the conjugate gradient method to solve Ax = b = ones(100, 1), where A is 
the —1,2,—1 second difference matrix A = toeplitz((2 — 1 zeros(1,98)]). Graph 
x10 and X29 from CG, along with the exact solution x. (Its 100 components are 
xj = (ih —i*h?)/2 with h = 1/101. “plot(é, x(i))” should produce a parabola.) 


Chapter 10 


Complex Vectors and Matrices 


10.1 Complex Numbers 


A complete presentation of linear algebra must include complex numbers. Even when the 
matrix is real, the eigenvalues and eigenvectors are often complex. Example: A 2 by 2 
rotation matrix has no real eigenvectors. Every vector in the plane turns by 9—its direction 
changes. But the rotation matrix has complex eigenvectors (1,7) and (1, —i). 


Notice that those eigenvectors are connected by changing i to —i. For a real matrix, the 
eigenvectors come in “conjugate pairs.” The eigenvalues of rotation by 0 are also conjugate 
complex numbers et? and e~/’. We must move from R” to C”. 


The second reason for allowing complex numbers goes beyond A and x to the matrix A. 
The matrix itself may be complex. We will devote a whole section to the most important 
example—the Fourier matrix. Engineering and science and music and economics all use 
Fourier series. In reality the series is finite, not infinite. Computing the coefficients in 
ciei” + cze?” +-+ + ce!" is a linear algebra problem. 

This section gives the main facts about complex numbers. It is a review for some 
students and a reference for everyone. Everything comes from i? = —1. The Fast Fourier 


Transform applies the amazing formula e27/ = 1. Add angles when e!? multiplies e!°: 


The square of e?*/4 = i is e###/4 = +1. The fourth power of e**!4 is e?** = 1. 


Adding and Multiplying Complex Numbers 


Start with the imaginary number i. Everybody knows that x? = —1 has no real solution. 
When you square a real number, the answer is never negative. So the world has agreed on 
a solution called i. (Except that electrical engineers call it j.) Imaginary numbers follow 
the normal rules of addition and multiplication, with one difference. Replace i? by —1. 
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: Add: (3+2i)+(3+2i)=6+4i 
: Multiply: (3 +2i)(1-i)=3+2i —3i - 2i? = 5—i. 


If I add 3 +7 to 1 — i, the answer is 4. The real numbers 3 + 1 stay separate from the 
imaginary numbers i — i. We are adding the vectors (3, 1) and (1, —1). 
The number (1 + i)? is 1 +i times 1 + i. The rules give the surprising answer 2i: 


GQ+i0+i)=1+ititi? =2i. 


In the complex plane, 1+ is at an angle of 45°. It is like the vector (1, 1). When we square 
1 +i to get 2i, the angle doubles to 90°. If we square again, the answer is (2i)? = —4. 
The 90° angle doubled to 180°, the direction of a negative real number. 
A real number is just a complex number z = a + bi, with zero imaginary part: b = 0. 
A pure imaginary number has a = 0: 
The real partis a=Re(a+ bi). The imaginary partis b = ïm (a + bi). 


The Complex Plane 


Complex numbers correspond to points in a plane. Real numbers go along the x axis. Pure 
imaginary numbers are on the y axis. The complex number 3 + 2i is at the point with 
coordinates (3, 2). The number zero, which is 0 + Oi, is at the origin. 

Adding and subtracting complex numbers is like adding and subtracting vectors in the 
plane. The real component stays separate from the imaginary component. The vectors go 
head-to-tail as usual. The complex plane C! is like the ordinary two-dimensional plane R?, 
except that we multiply complex numbers and we didn’t multiply vectors. 

Now comes an important idea. The complex conjugate of 3 + 2i is 3— 2i. The 
complex conjugate of z = 1 —i is Z = 1 +i. In general the conjugate of z = a + bi is 
Z = a—bi. (Some writers use a “bar” on the number and others use a “star”: Z = z*.) 
The imaginary parts of z and “z bar” have opposite signs. In the complex plane, Z is the 
image of z on the other side of the real axis. 


Two useful facts. When we multiply conjugates Z and Z2, we get the conjugate of 2422. 
When we add Z1 and Z2, we get the conjugate of z) + Z2: 


74 +22 = (3 — 2i) + (1 + i) = 4—-Zi. This is the conjugate of z} + 22 =4+i. 
Z1 X Z2 = (3 — 2i) x (1 +i) =5 +i. This is the conjugate of Z1 x z2 = 5—i, 


Adding and multiplying is exactly what linear algebra needs. By taking conjugates of 
Ax = Ax, when A is real, we have another eigenvalue A and its eigenvector ¥: 


If Ax = Ax and Ais real then Ax = AX. (1) 
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2i z=3+2i 
Complex 
plane r = |z| = V32 + 22 
Unit Real axis 
circle 


—2i 


Conjugate 7 = 3 — 2i 


Figure 10.1: The number z = a + bi corresponds to the point (a, b) and the vector | § |. 


Something special happens when z = 3 + 2i combines with its own complex conjugate 
z = 3 — 2i. The result from adding z + Z or multiplying zZ is always real: 


z +Z = real (B +2i)+(3—2i)=6 (real) 
zZ =real (3+2i)x (3—2i) =94+6i — 6i — 4i? =13 (real). 


The sum of z = a + bi and its conjugate Z = a — bi is the real number 2a. The product 
of z times Z is the real number a? + b?: 


Multiply z times Z (2) 


The next step with complex numbers is 1/z. How to divide by a + ib? The best idea is to 
multiply by Z/Z. That produces zZ in the denominator, which is a? + b?: 


SESO 1 aib atib, 
~ at+iba-ib a2 +b? 


1 1 3-2)  3-2i 
342i 34213-2i B` 


In case a? + b? = 1, this says that (a + ib)! isa — ib. On the unit circle, 1/z equals Z. 
Later we will say: 1/e!? is e~/® (the conjugate). A better way to multiply and divide is to 
use the polar form with distance r and angle 0. 

The Polar Form re’? 
The square root of a? + b? is |z|. This is the absolute value (or modulus) of the number 


z = a +ib. The square root |z| is also written r, because it is the distance from 0 to z. 
The real number r in the polar form gives the size of the complex number z: 


This is called r. 


The absolute value of z=a+ib is jah Po DP 
The absolute valueof z=3+2i is [z] = ¥3?+22. Thisisr = 413. 
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The other part of the polar form is the angle 8. The angle for z = 5 is 8 = O (because 
this z is real and positive). The angle for z = 3i is 7/2 radians. The angle for a negative 
z = —9 is x radians. The angle doubles when the number is squared. The polar form is 
excellent for multiplying complex numbers (not good for addition). 

When the distance is r and the angle is 0, trigonometry gives the other two sides of the 
triangle. The real part (along the bottom) is a = r cos @. The imaginary part (up or down) 
is b = r sin 0. Put those together, and the rectangular form becomes the polar form: 


Thenumber z=a+ib isalo z=rcosé +irsin@. Thisis re 


Note: cos@ + isin@ has absolute value r = 1 because cos? 0 + sin? 0 = 1. Thus 
cos 0 + i sin @ lies on the circle of radius 1—the unit circle. 


Example 1 Find r and @ for z = 1 + i and also for the conjugate Z = 1 — i. 


Solution The absolute value is the same for z and Z. Forz = 1+iitisr = VI +1 = V2: 
jz?=17+17=2 andalo |Z|? = 17 + (-1)* =2. 


The distance from the center is /2. What about the angle? The number 1 + i is at the 
point (1, 1) in the complex plane. The angle to that point is 2/4 radians or 45°. The cosine 
is 1/./2 and the sine is 1/./2. Combining r and 0 brings back z = 1 +i: 


rcosĝ + ir sinĝ = vi(—) +ivi(—) =1 +i. 


The angle to the conjugate 1 —i can be positive or negative. We can go to 77/4 radians 
which is 315°. Or we can go backwards through a negative angle, to —/4 radians or 
—45°. If z is at angle 0, its conjugate z is at 2x — 0 and also at —0. 

We can freely add 27 or 4a or —2z to any angle! Those go full circles so the final point 
is the same. This explains why there are infinitely many choices of 0. Often we select the 
angle between zero and 27 radians. But —@ is very useful for the conjugate Z. 


4 


Powers and Products: Polar Form 


Computing (1 + i)? and (1 + i)® is quickest in polar form. That form has r = /2 and 
0 = 1/4 (or 45°). If we square the absolute value to get r? = 2, and double the angle to 
get 20 = m/2 (or 90°), we have (1 +)”. For the eighth power we need r® and 80: 


(+i) r8 —2.2-2-2=16 and 80 = 8.2 = 2x. 
This means: (1 + į) has absolute value 16 and angle 27. The eighth power of 1 +i is the 


real number 16. 
Powers are easy in polar form. So is multiplication of complex numbers. 
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The polar form ofz” has absolute value r”, The angle is n times 6: 


The nth power of z= r(cos0 +isin®) is z" =r"(cosnd +isinnd). (3) 


In that case z multiplies itself. In all cases, multiply r’s and add the angles: 
r(cos@ + i sin @) times r’(cos 6’ + i sin 6’) = rr’(cos(6 + 0°) +i sin(0 +6’). (4) 


One way to understand this is by trigonometry. Concentrate on angles. Why do we get the 
double angle 26 for 27? 


(cos 6 + i sin9) x (cos@ + i sin 0) = cos? 6 + i? sin? 6 + 2i sin @ cos 0. 


The real part cos? 8 — sin? @ is cos 20. The imaginary part 2 sin 6 cos @ is sin 20. Those are 
the “double angle” formulas. They show that 6 in z becomes 26 in z?. 

There is a second way to understand the rule for z”. It uses the only amazing formula 
in this section. Remember that cos 9 + i sin 0 has absolute value 1. The cosine is made up 
of even powers, starting with 1 — 507. The sine is made up of odd powers, starting with 

— 463. The beautiful fact is that e’? combines both of those series into cos @ + i sin 0: 


1 1 . 1 1 
e” =1+x+ 5x + rai +.» becomes ei? =1+i64+ 51 O + Zi +e 


Write —1 for i? to see 1 — 467. The complex number e?? is cos@ + i sin@: 
E Euler’s Formula. “et? = cos 6 +i sind gives iz =rcosĝ +irsinð = re? 5): 


The special choice 6 = 27 gives cos 2m +i sin 2x which is 1. Somehow the infinite series 
e?ii — 1+ 2ri + 4 (2ni)? +--+ adds up to 1. 
Now multiply e}? times ei’. Angles add for the same reason that exponents add: 


3 5. il W ise 


e? times e? is e 19 ig e?i? 


times e’? is e ef? times e 


The powers (re!®)" are equal to r"e'”®. They stay on the unit circle when r = 1 
and r” = 1. Then we find n different numbers whose nth powers equal 1: 


Set w = emih, The nth powers of 1, w; w?, ws ‘ w” all equal I. 


Those are the “nth roots of 1.” They solve the equation z” = 1. They are equally spaced 
around the unit circle in Figure 10.2b, where the full 27 is divided by n. Multiply their 
angles by n to take nth powers. That gives w” = e?*! whichis 1. Also (w?)" = e47 = 1. 
Each of those numbers, to the nth power, comes around the unit circle to 1. 
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e4xi/6 e2mi/6 


6 solutions to z” = 1 
el2ni/6 _ p2xi — ] 


e8ni/6 el0xi/6 


Figure 10.2: (a) Multiplying eŻ? times e”. (b) The nth power of e27!/” is 27! = 1, 


These 7 roots of 1 are the key numbers for signal processing. The Discrete Fourier 
Transform uses w and its powers. Section 10.3 shows how to decompose a vector (a signal) 
into n frequencies by the Fast Fourier Transform. 


= REVIEW OF THE KEY IDEAS =u 


1. Adding a + ib toc + id is like adding (a, b) + (c,d). Use i? = —1 to multiply. 
2. The conjugate of z = a + bi = re? is Z = z* =a—bi = re`’, 
3. z times Z is re’? times re~!9. This is r? = |z|? = a? + b? (real). 


4. Powers and products are easy in polar form z = re’?. Multiply r’s and add @’s. 


Problem Set 10.1 


Questions 1-8 are about operations on complex numbers. 
1 Add and multiply each pair of complex numbers: 

(a) 2+i,2—i (b) —1+i,-1+i (c) cos@+isin0,cos@ —i sin@ 
2 Locate these points on the complex plane. Simplify them if necessary: 

(a) 2+i b) +i? (c) aT (d) |2+i| 


3 Find the absolute value r = |z| of these four numbers. If @ is the angle for 6 — 8i 
what are the angles for the other three numbers? 


(a) 6-8 (b+) (6-8)? © ge @ +8) 


> 
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and |z/w| = 


4 If |z] = 2 and |w| = 3 then |z x w| = and jz +w] < 
and |z — w| < 


5 Find a + ib for the numbers at angles 30°, 60°, 90°, 120° on the unit circle. If w is 
the number at 30°, check that w? is at 60°. What power of w equals 1? 


6 If z = rcos@ + irsin then 1/z has absolute value 
polar form is . Multiply z x 1/z to get 1. 


and angle . Its 


7 The complex multiplication M = (a + bi)(c + di) is a 2 by 2 real multiplication 


a —b||c]_ 
b aļjjd| i 
The right side contains the real and imaginary parts of M. Test M = (1+3i)(1—3i). 


8 A = A; +iA2 is a complex n by n matrix and b = b; + ib2 is a complex vector. 
The solution to Ax = 6 is xı + ix2. Write Ax = b as a real system of size 2n: 


Complex n by n xi] |b 
Real 2n by 2n x2| | bol? 
Questions 9-16 are about the conjugate Z = a — ib = re~!® = z*. 


9 Write down the complex conjugate of each number by changing i to —i: 


(a)2-i (b) (2-i)(1-i)  (¢&) e?*/? (which isi) 
(d) ei" =—1 (©) +4 (Wwhichisalsoi) (f) i! = 


10 The sum z +Z is always . The difference z — Z is always . Assume 
z #0. The product z x Z is always . The ratio z/Z always has absolute value 


11 Fora real matrix, the conjugate of Ax = Ax is Ax = AX. This proves two things: A 
is another eigenvalue and x is its eigenvector. Find the eigenvalues À, A and eigen- 
vectorsx,x of A=[a b; —b al. 


12 The eigenvalues of a real 2 by 2 matrix come from the quadratic formula: 


aer| 7 fy =P- a++ (ad be) =0 


gives the two eigenvalues À = E +d + y(a +d}? —4(ad — be)| /2. 


(a) Ifa = b = d = 1, the eigenvalues are complex when c is 
(b) What are the eigenvalues when ad = bc? 


(c) The two eigenvalues (plus sign and minus sign) are not always conjugates of 
each other. Why not? 
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13 In Problem 12 the eigenvalues are not real when (trace)? = (a + d}? is smaller than 
. Show that the A’s are real when bc > 0. 


14 Find the eigenvalues and eigenvectors of this permutation matrix: 
P4 = 


has det(Py —AJ) = 


orc} 
—- OC SO 


1 
0 
0 
0 


OoOoreo 


15 Extend P4 above to Pe (five 1’s below the diagonal and one in the corner). Find 
det( Ps — AJ) and the six eigenvalues in the complex plane. 


16 A real skew-symmetric matrix (AT = —A) has pure imaginary eigenvalues. First 
proof: If Ax = Ax then block multiplication gives 


-4 ollasak] 


This block matrix is symmetric. Its eigenvalues must be ! SoA is . 
Questions 17-24 are about the form ret? of the complex number r cos 6 + ir sin 6. 


17 Write these numbers in Euler’s form re!®. Then square each number: 
(a) 1+ V3i (b) cos20+isin20 (ce) -7i (d) 5-Si. 


18 Find the absolute value and the angle for z = sin @ + i cos @ (careful). Locate this z 
in the complex plane. Multiply z by cos 0 + i sin @ to get . 


19 Draw all eight solutions of zê = 1 in the complex plane. What is the rectangular 
form a + ib of the root z = W = exp(—27i/8)? 


20 Locate the cube roots of 1 in the complex plane. Locate the cube roots of —1. To- 
gether these are the sixth roots of 


21 By comparing e7/? “= cos36 + i sin30 with (eif)? = (cos + i sin), find the 
“triple angle” formulas for cos 38 and sin 30 in terms of cos ô and sin 8. 


22 Suppose the conjugate Z is equal to the reciprocal 1/z. What are all possible z’s? 


23 (a) Why doe’ and iê both have absolute value 1? 
(b) In the complex plane put stars near the points ef and i. 
(c) The number if could be (e’*/7)° or (e7!*/2)*. Are those equal? 
24 Draw the paths of these numbers from ¢ = 0 to t = 2z in the complex plane: 


(a) eit (b) e(-ltidt — et elt (c) (1) — etti., 
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10.2 Hermitian and Unitary Matrices 


The main message of this section can be presented in one sentence: When you transpose 
a complex vector z or matrix A, take the complex conjugate too. Don’t stop at zT or A’. 
Reverse the signs of all imaginary parts. From a column vector with z; = a; + ibj, 
the good row vector is the conjugate transpose with components a; —ib;: 


Conjugate transpose zZ" =[Z; --- %,]=[a,;—iby >- a, —ib, |]. (1) 


Here is one reason to go to Z. The length squared of a real vector is x? + --- + x2. The 
length squared of a complex vector is not z? + --- + z2. With that wrong definition, the 
length of (1,i) would be 1? + i? = 0. A nonzero vector would have zero length—not 
good. Other vectors would have complex lengths. Instead of (a + bi)? we want a? + b?, 
the absolute value squared. This is (a + bi) times (a — bi). 

For each component we want z; times Z;, which is |z; |? = a4 +b7. That comes when 
the components of z multiply the components of Z: 


Z1 


Length Iz. aa ]| : |= a++ zal. Thisis 272 = |z|. (2) 


squared 


Now the squared length of (1, i) is 1? + |i]? = 2. The length is /2. The squared length of 
(1 +i, 1 —i) is 4. The only vectors with zero length are zero vectors. 


Zz = zřz = |z|? +--+ + Zn" 


Before going further we replace two symbols by one symbol. Instead of a bar for the 
conjugate and T for the transpose, we just use a superscript H. Thus z" = z", This is 
“z Hermitian,” the conjugate transpose of z. The new word is pronounced “Hermeeshan.” 
The new symbol applies also to matrices: The conjugate transpose of a matrix A is AĦ. 


Another popular notation is A*. The MATLAB transpose command ’ automatically 
takes complex conjugates (A’ is A#). 


nae , _ oT 
The vector z" is Z". The matrix A" is A , the conjugate transpose of A: 


H_« eye 59 — l i H 1 0 
A” = “A Hermitian” If a= +i then A” = i 11-i 


Complex Inner Products 


For real vectors, the length squared is xTx—the inner product of x with itself. For 
complex vectors, the length squared is zz. It will be very desirable if z%z is the inner 
product of z with itself. To make that happen, the complex inner product should use the 
conjugate transpose (not just the transpose). The inner product sees no change when the 
vectors are real, but there is a definite effect from choosing W, when u is complex: 
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DEFINITION The inner product of real or complex vectors ard ù is uù: l 


Tels Tivi +--+ + Und. 8) 


With complex vectors, zy is different from vu. The order of the vectors is now impor- 
tant. In fact vtu = viui + -+° + Unun is the complex conjugate of uv. We have to put 
up with a few inconveniences for the greater good. 


Example 1 The inner product of u = H with v = H is[i —i] H = 0. 


Example 1 is surprising. Those vectors (1,7) and (i, 1) don’t look perpendicular. But they 
are. A zero inner product still means that the (complex) vectors are orthogonal. Similarly 
the vector (1,7) is orthogonal to the vector (1, —i). Their inner product is 1 — 1 = 0, We 
are correctly getting zero for the inner product—where we would be incorrectly getting 
zero for the length of (1, i) if we forgot to take the conjugate. 


Note We have chosen to conjugate the first vector u. Some authors choose the second 
vector v. Their complex inner product would be uT®ō. It is a free choice, as long as we 
stick to it. We wanted to use the single symbol # in the next formula too: 


The inner product of Au with v equals the inner product of u with A®v: 
A® = “adjoint” of A (Au) By =u" (4B). (4) 


The conjugate of Au is Au. Transposing it gives 77A as usual. This is uA", Everything 
that should work, does work. The rule for " comes from the rule for T. That applies to 
products of matrices: 


4 


The conjugate transpose of AB is (AB)! = BA. 


We constantly use the fact that (a — ib)(c — id) is the conjugate of (a + ib)(c +id). 


Hermitian Matrices 


Among real matrices, the symmetric matrices form the most important special class: A = 
AT. They have real eigenvalues and a full set of orthogonal eigenvectors. The diagonalizing 
matrix S is an orthogonal matrix Q. Every symmetric matrix can be written as A = 
QAQ™! and also as A = QAQ™ (because Q7! = QT). All this follows from aj; = 4ji, 
when A is real. 
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Among complex matrices, the special class contains the Hermitian matrices: 
A = A". The condition on the entries is aj; = @jj. In this case we say that “A is 
Hermitian.” Every real symmetric matrix is Hermitian, because taking its conjugate has no 
effect. The next matrix is also Hermitian, A = A}: 


E e2 A= 2 3— 3i The main diagonal is real since a;; = @jj. 
xample 1343 5 Across it are conjugates 3 + 3i and 3 — 3i. 


This example will illustrate the three crucial properties of all Hermitian matrices. 


“If A = A¥ and z is any vector, the number 2° Az is re 


Quick proof: z" Az is certainly 1 by 1. Take its conjugate transpose: 

(zĦ Az) = zh A} (zĦ)E which is z4 Az again. 
This used A = AF. So the number z4 Az equals its conjugate and must be real. Here is 
that “energy” z4 Az in our example: 


[z z | 2 3— 3i zı | = 27,2, + 5Z2z2 + (3 — 3i )Zı z2 + (3 + 3i)z1Z2. 
1 421| 3 4 3: 5 Z2 diagonal off-diagonal 


The terms 2|z;|? and 5|z2|? from the diagonal are both real. The off-diagonal terms are 
conjugates of each other—so their sum is real. (The imaginary parts cancel when we add.) 
The whole expression z Az is real, and this will make A real. 


_ Every eigenvalue of a Hermitian matrix is real 


Proof Suppose Az = Az. Multiply both sides by z" to get z4Az = Az¥z. On the left 
side, z4 Az is real. On the right side, zz is the length squared, real and positive. So the 
ratio A = z4 Az/z4z is a real number. Q.E.D. 


The example above has eigenvalues 1 = 8 and A = —1, real because A = A: 


2—4 3--3i 


_ 42 _ _ +42 
343) 51| =A 7 +10- ]3+3i] 


=? —7 + 10-18 = (A-8)(A + 1). 


` If Az = Àz and Ay = By then yïz = 0. oe 


different eigenvalue 


Proof Multiply Az = Az on the left by y#. Multiply yH A! = $y! on the right by z: 


ytaz=Ay'z and y!ĦA"z = py'z. (5) 
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The left sides are equal because A = A. Therefore the right sides are equal. Since £ is 
different from A, the other factor yz must be zero. The eigenvectors are orthogonal, as in 
our example with A = 8 and f = —1: 


—6 3-31 ||2 
(4-8 =, -3 2] 
(+Dy=|525 z e] 


Take the inner product of those eigenvectors y and z: 


. H , 1 
Orthogonal eigenvectors yz= [1 +1 -1] |, 4 i = 0. 
These eigenvectors have squared length 1? + 1? + 1? = 3. After division by 3 they are 
unit vectors. They were orthogonal, now they are orthonormal. They go into the columns 
of the eigenvector matrix S, which diagonalizes A. 

When A is real and symmetric, S is Q—an orthogonal matrix. Now A is complex and 
Hermitian. Its eigenvectors are complex and orthonormal. The eigenvector matrix S is like 
Q, but complex. We now assign a new name “unitary” and a new letter U to a complex 
orthogonal matrix. 


Unitary Matrices 


A unitary matrix U is a (complex) square matrix that has orthonormal columns. 
U is the complex equivalent of Q. The eigenvectors of A give a perfect example: 


. . o 1l 1 l—i 
Unitary matrix U = WA F +i | 


This U is also a Hermitian matrix. I didn’t expect that! The example is almost too perfect. 
We will see that the eigenvalues of this U must be | and —1. 

The matrix test for real orthonormal columns was QTQ = I. When QT multiplies Q, 
the zero inner products appear off the diagonal. In the complex case, Q becomes U. The 
columns show themselves as orthonormal when UF multiplies U. The inner products of 
the columns are again 1 and 0. They fill up UĦU = 7: 


Suppose U (with orthonormal columns) multiplies any z. The vector length stays the 
same, because zZ4U4Uz = z4z. If z is an eigenvector of U we learn something more: 
The eigenvalues of unitary (and orthogonal) matrices all have absolute value |A| = 1. 
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If U is unitary then’ |Uz\| = izi- The 


Our 2 by 2 example is both Hermitian (U = U®) and unitary (U~! = U¥), That 
means real eigenvalues (A = A), and it means |A| = 1. A real number with absolute value 
1 has only two possibilities: The eigenvalues are į or —1. 


Since the trace is zero for our U, one eigenvalue is A = 1 and the other is A = —1. 


Example 3 The 3 by 3 Fourier matrix is in Figure 10.3. Is it Hermitian? Is it uni- 
tary? F3 is certainly symmetric. It equals its transpose. But it doesn’t equal its conjugate 
transpose—it is not Hermitian. If you change i to —i, you get a different matrix. 


e2ni /3 
1 1 1 
Fourier F= i 1 eil p4ni/3 
1 matrix s3 , , 
i efti/3 e2ni/3 
etri /3 


Figure 10.3: The cube roots of 1 go into the Fourier matrix F = F3. 


Is F unitary? Yes. The squared length of every column is (1 + 1+ 1) (unit vector). 
The first column is orthogonal to the second column because 1 + e?7//3 + eiB = 0. 
This is the sum of the three numbers marked in Figure 10.3. 

Notice the symmetry of the figure. If you rotate it by 120°, the three points are in the 
same position. Therefore their sum S also stays in the same position! The only possible 
sum in the same position after 120° rotation is S = 0. 

Is column 2 of F orthogonal to column 3? Their dot product looks like 


$(1 + 26/3 4 e®/3) = 141 +1). 


This is not zero. The answer is wrong because we forgot to take complex conjugates. The 
complex inner product uses # not T: 


(column 2)! (column 3) — ta -1+ e 2/3 o4til3 + e4ti/3 e27il3) 


— iq + e?7in + e 27/3) — (0), 


So we do have orthogonality. Conclusion: F is a unitary matrix. 

The next section will study the n by n Fourier matrices. Among all complex unitary 
matrices, these are the most important. When we multiply a vector by F, we are comput- 
ing its Discrete Fourier Transform. When we multiply by F—!, we are computing the 
inverse transform. The special property of unitary matrices is that F~! = F". The inverse 
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transform only differs by changing i to —i: 


1 l 1 
l ; , 
Change į to —i Ful = FH = — |1 en 2i/3 e74riN3 
/3 1 geT4rið3 e72ri/3 


Everyone who works with F recognizes its value. The last section of the book will bring 
together Fourier analysis and complex numbers and linear algebra. 

This section ends with a table to translate between real and complex—for vectors and 
for matrices: 


Real versus Complex 
R”: vectors with n real components <> C”: vectors with n complex components 
length: |x|? = xp +--+ x7 < length: |z|? = [z1]? +-+- + |zn/? 
transpose: (AT); = Aji conjugate transpose: (A¥);; = Aj; 
product rule: (AB)? = B™A™ product rule: (AB)# = BHAH 
inner product: uly = Wivi +--+ UnVa 
reason for AM; (Au)#o = uh (A¥y) 


orthogonality: u#v = 0 


dot product: x? y = xypyy tess + XnYn 
reason for AT: (Ax)'y = xT(ATy) 
orthogonality: xy = 0 
Hermitian matrices: A = AB 
A = UAUT! = UAU" (teal A) 
skew-Hermitian matrices K! = —K 


symmetric matrices: A = AT 

A = QAQ! = QAQ" (real A) 

skew-symmetric matrices: KT = —K 
orthogonal matrices: QT = Q7! unitary matrices: UH = U7! 

orthonormal columns: UU = J 


(Ux)P(Uy) = x"y and ||Uz|| = |lz| 


orthonormal columns: QTQ = J 


(Qx)"(Oy) =x"y and | Qx] = |x| 


ttt ttr¢rrir? tis 


The columns and also the eigenvectors of Q and U are orthonormal. Every |A] = 1. 


Problem Set 10.2 

1 Find the lengths of u = (1 + 7,1 —i,1 + 22) and v = (i,i,i). Also find u"v and 
ytu. 

2 Compute AMA and AA". Those are both matrices: 


3 Solve Az = 0 to find a vector in the nullspace of A in Problem 2. Show that z is 
orthogonal to the columns of A#. Show that z is not orthogonal to the columns of 
AT. The good row space is no longer C (A7). Now it is C (AP). 
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4 


10 


11 


12 


13 


14 


Problem 3 indicates that the four fundamental subspaces are C(A) and N(A) and 
and . Their dimensions are still r and n — r and r and m — r. They are 
still orthogonal subspaces. The symbol ® takes the place of T. 


(a) Prove that AMA is always a Hermitian matrix. 


(b) If Az = 0 then AĦAz = 0. If AH Az = 0, multiply by z!" to prove that 
Az = 0. The nullspaces of A and AMA are . Therefore AĦA is an 
invertible Hermitian matrix when the nullspace of A contains only z = 0. 


True or false (give a reason if true or a counterexample if false): 


(a) If A is areal matrix then A + iJ is invertible. 
(b) If A is a Hermitian matrix then A + i/ is invertible. 
(c) If U is a unitary matrix then A + i/ is invertible. 
When you multiply a Hermitian matrix by a real number c, is cA still Hermitian? 


Show that iA is skew-Hermitian when A is Hermitian, The 3 by 3 Hermitian matrices 
are a subspace provided the “scalars” are real numbers. 


Which classes of matrices does P belong to: invertible, Hermitian, unitary? 


Oi 0 
P=]0 0 i 
i 0 0 


Compute P?, P3, and P!9°. What are the eigenvalues of P? 


Find the unit eigenvectors of P in Problem 8, and put them into the columns of a 
unitary matrix F. What property of P makes these eigenvectors orthogonal? 


Write down the 3 by 3 circulant matrix C = 21 + 5P. It has the same eigenvectors 
as P in Problem 8. Find its eigenvalues. 


If U and V are unitary matrices, show that UT! is unitary and also UV is unitary. 
Start from UFU = J and VAV = J. 


How do you know that the determinant of every Hermitian matrix is real? 


The matrix A A is not only Hermitian but also positive definite, when the columns 
of A are independent. Proof: z" AĦ Az is positive if z is nonzero because 


Diagonalize this Hermitian matrix to reach A = UAU#; 


0 1-i 
a= Ri Mi 
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15 


16 


17 


18 


19 


20 


21 
22 
23 
24 


25 


26 


27 
28 
29 


30 
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Diagonalize this skew-Hermitian matrix to reach K = VAU". All A’s are 


[o -1+4+i 
K= 9; i |. 


Diagonalize this orthogonal matrix to reach Q = UAU". Now all A’s are : 


0= Et mna] 


sinf cos@ 


Diagonalize this unitary matrix V to reach V = UAU". Again all A’s are : 


l l l—i 
v=aliie a] 
If vi,..., Un is an orthonormal basis for C”, the matrix with those columns is a 
matrix. Show that any vector z equals (vilz)u, +--- + (vHz)vn. 
The functions e~'* and e!* are orthogonal on the interval 0 < x < 2m because their 


: 1 [27 
inner product is fo = 0. 


The vectors v = (1,i,1),w = (i,1,0) and z = are an orthogonal basis for 


If A= R-+iS is a Hermitian matrix, are its real and imaginary parts symmetric? 
The (complex) dimension of C” is . Find a non-real basis for C”. 


Describe all 1 by 1 and 2 by 2 Hermitian matrices and unitary matrices. 


How are the eigenvalues of A™ related to the eigenvalues of the square complex 
matrix A? 


If u4u = 1 show that J — 2uu! is Hermitian and also unitary. The rank-one matrix 
uu"! is the projection onto what line in C”? 


If A + iB is a unitary matrix (A and B are real) show that Q = [4-8] is an 
orthogonal matrix. 


If A + iB is Hermitian (A and B are real) show that [ 4 -B ] is symmetric. 
Prove that the inverse of a Hermitian matrix is also Hermitian (transpose A~!A = J). 
Diagonalize this matrix by constructing its eigenvalue matrix A and its eigenvector 
matrix S: > i; 

a= 3 |= 


A matrix with orthonormal eigenvectors has the form A = UAU-~! = UAU". 
Prove that AAH = AA. These are exactly the normal matrices. Examples are 
Hermitian, skew-Hermitian, and unitary matrices. Construct a 2 by 2 normal matrix 
by choosing complex eigenvalues in A. 
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10.3 The Fast Fourier Transform 


Many applications of linear algebra take time to develop. It is not easy to explain them 
in an hour. The teacher and the author must choose between completing the theory and 
adding new applications. Often the theory wins, but this section is an exception. It explains 
the most valuable numerical algorithm in the last century. 

We want to multiply quickly by F and F—, the Fourier matrix and its inverse. This 
is achieved by the Fast Fourier Transform. An ordinary product Fe uses n? multiplications 
(F has n? entries). The FFT needs only n times ł log, n. We will see how. 

The FFT has revolutionized signal processing. Whole industries are speeded up by this 
one idea. Electrical engineers are the first to know the difference—they take your Fourier 
transform as they meet you (if you are a function). Fourier’s idea is to represent f as a 
sum of harmonics cye!**. The function is seen in frequency space through the coefficients 
Ck, instead of physical space through its values f(x). The passage backward and forward 
between c’s and f’s is by the Fourier transform. Fast passage is by the FFT. 


Roots of Unity and the Fourier Matrix 


Quadratic equations have two roots (or one repeated root). Equations of degree n have n 
roots (counting repetitions). This is the Fundamental Theorem of Algebra, and to make it 
true we must allow complex roots. This section is about the very special equation z” = 1. 
The solutions z are the “nth roots of unity.” They are n evenly spaced points around the 
unit circle in the complex plane. 

Figure 10.4 shows the eight solutions to z = 1. Their spacing is (360°) = 45°. The 
first root is at 45° or 0 = 27/8 radians. It is the complex number w = eff = et2n/8, 
We call this number wg to emphasize that it is an 8th root. You could write it in terms of 


cos 22 and sin =z but don’t do it. The seven other 8th roots are w?, w?,..., w8, going 
around the circle. Powers of w are best in polar form, because we work only with the 
angles oz az... a = 2y. 

wr =i 


; 2 2 
= €2mi/8 — cos +i sin 


Real axis 


Figure 10.4: The eight solutions to z8 = 1 are 1, w, w?,..., w7 with w = (1 + i)/ v2. 
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The fourth roots of 1 are also in the figure. They are i,—1,—i,1. The angle is now 
27/4 or 90°. The first root w4 = e?'/4 is nothing but i. Even the square roots of 1 
are seen, with wz = e!?%/2 = —1. Do not despise those square roots 1 and —1. The 
idea behind the FFT is to go from an 8 by 8 Fourier matrix (containing powers of wg) 
to the 4 by 4 matrix below (with powers of w4 = i). The same idea goes from 4 to 2. 
By exploiting the connections of Fg down to F4 and up to Fis (and beyond), the FFT 
makes multiplication by F924 very quick. 

We describe the Fourier matrix, first for n = 4. Its rows contain powers of 1 and w and 
w? and w?. These are the fourth roots of 1, and their powers come in a special order. 


Fourier I l D L l l L l 
matrix F = l ow wo Y = ; yt 
nad 1 w? wt ws 1 i? i* iô 

1 w w® w? i i? if 7? 


The matrix is symmetric (F = FT). It is not Hermitian. Its main diagonal is not real. But 
5F is a unitary matrix, which means that (5 FĦ)(4 F) = I: 


bl 
ml 


The inverse changes from w = i to W = —i. That takes us from F to F. When the Fast 
Fourier Transform gives a quick way to multiply by F, it does the same for F~!. 

The unitary matrix is U = F/./n. We avoid that ./n and just put + outside F~}. The 
main point is to multiply F times the Fourier coefficients co, C1, C2, C3: 


4-point yo A i 2 K co 
Fourier Yl | = Fe = 1 w? wt we H (1) 
series v2 3 6 .9 || 2 

y3 1 w wY w C3 


The input is four complex coefficients co, C1, ¢2,¢3. The output is four function values 
Yo, Y1, Y2, y3. The first output yo = Co + c1 + C2 + c3 is the value of the Fourier series at 
= 0. The second output is the value of that series Y` cye’** at x = 27/4: 


yi = ĉo + cjeli?” + cel tnl4 + ezet 6TA = co + cw + cow? + cw. 


The third and fourth outputs yz and y3 are the values of )*c,e’** at x = 42/4 and 
x = 62/4. These are finite Fourier series! They contain n = 4 terms and they are 
evaluated at n = 4 points. Those points x = 0, 27/4, 42/4, 67/4 are equally spaced. 

The next point would be x = 82/4 which is 27. Then the series is back to yo, because 
e*! is the same as e? = 1. Everything cycles around with period 4. In this world 2 + 2 is 
0 because (w*)(w?) = w? = 1. We will follow the convention that j and k go from 0 to 
n — I (instead of 1 ton). The “zeroth row” and “zeroth column” of F contain all ones. 
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The n by n Fourier matrix contains powers of w = e27!/": 


1 1 Io. co Yo 
I w w? w”! ci yı 

Fye= |1 w wt e wD) ea =j a = (2) 
1 wl wD . wD? || cn Yn-1 


Fn is symmetric but not Hermitian. Its columns are orthogonal, and Fy F pn = nl. Then 
F7! is F,,/n. The inverse contains powers of Wa = e~27'/", Look at the pattern in F: 


When we multiply ¢ by Fp, we sum the series at n points. When we multiply y by F7 ', we 
find the coefficients c from the function values y. In MATLAB that command is c = fft(y). 
The matrix F passes from “frequency space” to “physical space.” 
Important note. Many authors prefer to work with w = e~2"'/N | which is the complex 
conjugate of our w. (They often use the Greek omega, and I will do that to keep the two 
options separate.) With this choice, their DFT matrix contains powers of w not w. It is 
conj (F) = complex conjugate of our F. This takes us to frequency space. 

F is a completely reasonable choice! MATLAB uses w = e~?7'/" , The DFT matrix 
fft(eye(N )) contains powers of this number œ = wW. The Fourier matrix with w’s recon- 
structs y from c. The matrix F with w’s computes Fourier coefficients as fft(y). 


Also important. When a function f(x) has period 27, and we change x to e!9, 


the function is defined around the unit circle (where z = ef). Then the Discrete 
Fourier Transform from y to c is matching n values of t this f(z) by a polynomial 
P(z) = Co + cZ tees Hey 2”. 


AnI er, T DONIEST A Tse AEN V TE T TEPA 


. Cnm 1 SO y that at p(e) = = fle) at at n 7 points z= 


7 PECL ANJAL SLE EP AIN A e AE 


4 


The Fourier matrix is the Vandermonde matrix for interpolation at those n points. 


One Step of the Fast Fourier Transform 


We want to multiply F times c as quickly as possible. Normally a matrix times a vector 
takes n? separate multiplications—the matrix has n? entries. You might think it is impos- 
sible to do better. (If the matrix has zero entries then multiplications can be skipped. But 
the Fourier matrix has no zeros!) By using the special pattern w/* for its entries, F can be 
factored in a way that produces many zeros. This is the FFT. 

The key idea is to connect F, with the half-size Fourier matrix Fn;2. Assume that n 
is a power of 2 (say n = 21° = 1024). We will connect F924 to F'5;2—0r rather to two 


512 Chapter 10. Complex Vectors and Matrices 


copies of F512. When n = 4, the key is in the relation between these matrices: 


11 1 41 1 1 
1 i i? P Fz 1 i? 
Fa=| 1 ;2 j4 56 and F, |= iol 
1 i> i& 79 1 i? 


On the left is F4, with no zeros. On the right is a matrix that is half zero. The work is cut 
in half. But wait, those matrices are not the same. We need two sparse and simple matrices 
to complete the FFT factorization: 


l 1 11 1 

Factors 1 i 1 i? l 

forFFT /4= | 4 -i 1 l (3) 
1 -i 1 i 1 


The last matrix is a permutation. It puts the even c’s (cg and c2) ahead of the odd c’s (c1 
and c3). The middle matrix performs half-size transforms F2 and Fz on the evens and 
odds. The matrix at the left combines the two half-size outputs—in a way that produces 
the correct full-size output y = Fc. 

1 


The same idea applies when n = 1024 and m = 5n = 512. The number w is 


e2mi/1024 Tt is at the angle 9 = 27/1024 on the unit circle. The Fourier matrix Fio2a 
is full of powers of w. The first stage of the FFT is the great factorization discovered by 
Cooley and Tukey (and foreshadowed 1 in 1805 by Gauss): 
T even-odd ; 
alsa) 


I 512 Dei’ 

L? 512 —Psi2 permutation 
I51> is the identity m matrix. Day 12 is the diagonal m matrix with entries í, w, w311), The 
two copies of F512 are what we expected. Don’t forget that they use the 5 12th root of unity 
(which is nothing but w?!!) The permutation matrix separates the incoming vector ¢ into 


its even and odd parts c’ = (co, ¢2,..., C1022) and c” = (C1, C3, ... , C1023). 
Here are the algebra formulas which say the same thing as the factorization of F924: 


Yj = Vj + way}, j =0,...,m 


Those formulas come from separating even C2% from odd c2441: 


m—l1 m—1 


— A jk — 2jk i(2k+1 . _ 1 
y= > w Ck = yw coe + >) wi Coa with m = 5n. (6) 
0 0 0 
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The even c’s go into c’ = (Co, C2,...) and the odd c’s go into c” = (c;,¢3,...). Then 


come the transforms Fmc’ and Fme”. The key is w2 = Wm. This gives w2 k- wi. 
Rewrite yj = Sowie + (Wnr) So wfr ey! =y; + (wn) yj’. (7) 


For j > m, the minus sign in (5) comes from factoring out (w,)" = —1. 

MATLAB easily separates even c’s from odd c’s and multiplies by wz. We use conj(F) 
or equivalently MATLAB’s inverse transform ifft, because fft is based on œ = W=e727'/", 
Problem 17 shows that F and conj(F) are linked by permuting rows. 

FFT step y= iff (e(0 : 2 : n—2))* n/2; 
from 7 to n/2 y” = ifft (c(1:2:n—1)) *n/2; 


, d = w.^(0:n/2— 1}; 
In MATLAB y= [y + d.* y”; y —d.*# y"); 


The flow graph shows c’ and c” going through the half-size F>. Those steps are called 
“butterflies,” from their shape. Then the outputs y’ and y” are combined (multiplying y” 
by 1,7 and also by —1, —i) to produce y = F4c. 

This reduction from Fy to two Fm’s almost cuts the work in half—you see the zeros in 


the matrix factorization. That reduction is good but not great. Fhe full idea of the FFT is 
much more powerful. It saves much more than half the time. 


00 00 
č 

10 01 

01 10 
e” 

11 1] 


The Full FFT by Recursion 


If you have read this far, you have probably guessed what comes next. We reduced F, to 
Faj2. Keep going to F,;4. The matrices F512 lead to F256 (in four copies). Then 256 leads 
to 128. That ts recursion. It is a basic principle of many fast algorithms, and here is the 
second stage with four copies of F = Fəs and D = D256: 


I D F pick 0,4,8,- 

F512 _ |i -D F pick 2,6, 10,-- 
F512 ~ I D F pick 1,5,9,- 

I -D F | | pick 3,7,11,- 
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We will count the individual multiplications, to see how much is saved. Before the FFT 
was invented, the count was the usual n? = (1024)*. This is about a million multiplica- 
tions. I am not saying that they take a long time. The cost becomes large when we have 
many, many transforms to do—which is typical. Then the saving by the FFT is also large: 


The final count for size n = 2° is reduced from n? to int. 


The number 1024 is 21°, so £ = 10. The original count of (1024)? is reduced to 
(5)(1024). The saving is a factor of 200. A million is reduced to five thousand. That is why 
the FFT has revolutionized signal processing. 

Here is the reasoning behind int. There are £ levels, going from n = 2° down to 
n = 1. Each level has n/2 multiplications from the diagonal D’s, to reassemble the half- 
size outputs from the lower level. This yields the final count ing, which is in log, n. 

One last note about this remarkable algorithm. There is an amazing rule for the order 
that the c’s enter the FFT, after all the even-odd permutations. Write the numbers 0 to 
n — | in binary (base 2). Reverse the order of their digits. The complete picture shows the 
bit-reversed order at the start, the £ = log, n steps of the recursion, and the final output 
Yo.--+»¥n—1 Which is F, times c. 

The book ends with that very fundamental idea, a matrix multiplying a vector. 


Thank you for studying linear algebra. I hope you enjoyed it, and I very much hope you 
will use it. It was a pleasure to write about this tremendously useful subject. 


Problem Set 10.3 
1 Multiply the three matrices in equation (3) and compare with F. In which six entries 
do you need to know that i? = —1? 


2 Invert the three factors in equation (3) to find a fast factorization of F7!. 
3 F is symmetric. So transpose equation (3) to find a new Fast Fourier Transform! 


4 All entries in the factorization of F¢ involve powers of wg = sixth root of 1: 


reli oll” all? | 


Write down these matrices with 1, we, we in D and w3 = we in F3. Multiply! 


5 Ifv = (1,0,0,0) and w = (1, 1,1, 1), show that Fv = w and Fw = 4v. Therefore 
Flw = v and F7!y = 


6 — Whatis F? and what is F* for the 4 by 4 Fourier matrix? 


7 Put the vector c = (1,0, 1, 0) through the three steps of the FFT to find y = Fc. Do 
the same for c = (0, 1,0, 1). 
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11 


12 


13 


14 


15 


16 
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The Fast Fourier Transform 
Compute y = Fc by the three FFT steps for e = (1,0,1,0, 1,0, 1,0). Repeat the 
computation for e = (0, 1,0, 1,0, 1,0, 1). 


If w = e?*'/64 then w? and ./w are among the and roots of 1. 


(a) Draw all the sixth roots of 1 on the unit circle. Prove they add to zero. 
(b) What are the three cube roots of 1? Do they also add to zero? 


The columns of the Fourier matrix F are the eigenvectors of the cyclic permutation 
P. Multiply PF to find the eigenvalues A, to A4: 


010 0]fi1 1 1 1 1 1 1 ]fa 
001 0{]}1 i i? P| Pipp ho 
00 0 1/]1 i? if i| |1 Gi? if G8 A3 
100 0/}1 i i i? 1 i> if 79 ha 


This is PF = FA or P = FAF™!. The eigenvector matrix (usually S) is F. 


The equation det(P — AI) = 0 is A+ = 
matrix A is 


l. This shows again that the eigenvalue 
. Which permutation P has eigenvalues = cube roots of 1? 


(a) Two eigenvectors of C are (1, 1, 1, 1) and (1,i,i?,7). Find the eigenvalues. 


Co ĉi C2 63 1 1 1 l 
Cĉ&33 Co &ı c 1 1 i i 
3 M0 MI 2 =e] and C| .n | =e] 22 
C2 C3 Co Cy 1 1 i i 
C1 C2 C3 CO l l i? i 


(b) P = FAF! immediately gives P? = FA? F7! and P? = FA? F7!. Then 
C = col +61 P + c2 P? + c3 P? = F(col + c14 + CoA? + c3 A?) FT! = 
FEF—!. That matrix E in parentheses is diagonal. It contains the of C. 


Find the eigenvalues of the “periodic” —1,2,—1 matrix from E = 27 —A — A3, 
with the eigenvalues of P in A. The —1’s in the corners make this matrix periodic: 


2-1 0 -i 
-1 2-1 0 

C= 0-1 2-1 has Co = 2,¢; = —l,cz = 0, c3 = —1. 
-1 0-1 2 


Fast convolution. To multiply C times a vector x, we can multiply F(E(F7'x)) 
instead. The direct way uses n? separate multiplications. Knowing E and F, the 
second way uses only n log, n + n multiplications. How many of those come from 
E, how many from F, and how many from F71? 


Why is row i of F the same as row N —i of F (numbered 0 to N — 1)? 


Solutions to Selected Exercises 


Problem Set 1.1, page 8 


1 The combinations give (a) a line in R° (b) a plane in R? (c) all of R°. 
4 3v + w =(7,5) and cv + dw = (2c+d,c+2d). 
6 The components of every cv + dw add to zero. c = 3 and d = 9 give (3,3, —6). 
9 The fourth corner can be (4, 4) or (4, 0) or (—2, 2). 
11 Four more corners (1, 1,0), (1,0, D: (0,1,1), a L 1). The center point is 2 > p 
Centers of faces are (4, 5,0), (3.4 3, 1) and (0, 4 5 +), (2 5: 3) and (3,0 , 4), G, 


12 A four-dimensional cube has 24 = 16 corners and 2+ 4 = 8 three-dimensional faces 
and 24 two-dimensional faces and 32 edges in Worked Example 2.4 A. 


13 Sum = zero vector. Sum = —2:00 vector = 8:00 vector. 2:00 is 30° from horizontal 
= (cos sin E) = (/3/2, 1/2). 
16 All combinations with c + d = 1 are on the line that passes through v and w. 


The point V = —v + 2w is on that line but it is beyond w. 


17 All vectors cv + cw are on the line passing through (0,0) and u = tv + iw. That 
line continues out beyond v + w and back beyond (0, 0). With c > 0, half of this line 
is removed, leaving a ray that starts at (0,0). 


N 


20 (a) 4 su + iv + iw is the center of the triangle between u, v and w; lu + tw lies 
between u and w (b) To fill the triangle keep c > 0, d >0,e>0,andc+d+e = 1. 


22 The vector $(u +v + w) is outside the pyramid because c +d +e = t + + + 4 > 1. 


25 (a) Fora line, choose u = v = w = any nonzero vector (b) For a plane, choose 
u and v in different directions. A combination like w = u + v is in the same plane. 


Problem Set 1.2, page 19 


3 Unit vectors v/|lvl| = (4, £) = (.6,.8) and w/w] = ($, 2) = (.8,.6). The cosine 


of @ is joi ' TWT = $4. The vectors w, u, —w make 0°, 90°, 180° angles with w. 

4 (a) v. (—v) = -1 (b) wv + w): w- w) =v- v+w- v-v- w-—wew = 
1+( )-(€ )—-1=0s00 = 90° (notice v-w = w-v) (c) (v—2w)-(v+2w) = 
v-v—4w-w=1-4= -3. 
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6 All vectors w = (c, 2c) are perpendicular to v. All vectors (x, y, z) withxy+y+z=0 
lie on a plane. All vectors perpendicular to (1, 1, 1) and (1, 2,3) lie ona line. 
9 Ifvzw2/viw; = —1| then vaw2 = —vıwı OF Vy W1+V2W2 = vew = Q: perpendicular! 
11 v-w < 0 means angle > 90°; these w’s fill half of 3-dimensional space. 
12 (1, 1) perpendicular to (1,5) — c(i, 1) if 6 — 2c = Oorc = 3; v» (w — cv) = Oif 
c = v. w/v. v. Subtracting cv is the key to perpendicular vectors. 
15 4(x + y) = (2 + 8)/2 = 5; cos 6 = 2/16/V10V10 = 8/10. 
17 cosa = 1//2,cos B = 0, cosy = —1/ V2. For any vector v, cos? a +cos? B +cos? y 
= (uf + v3 + v3)/iJo|? = 1. 
21 2v-w < 2||v||||w|| leads to ||vtw|l? = v-v+2v-w+w-w < ||v||7+2]|v||||w|]|-+]]/ wv]. 
This is (||v|| + lwl)”. Taking square roots gives v + w]| < iol + Ilw ll. 
22 v2w? +2v wi v2w2 + vews < vlw? + vj ws + vw? + vw is true (cancel 4 terms) 
because the difference is v2w3 + v3w? — 20, Ww v2W2 which is (vj w2 — v2W1)? > 0. 
23 cos 8 = w,/||w|| and sin B = w2/||w||. Then cos(B —a) = cos B cosa +sin £ sina = 
v1 w1/||v|| wl] + vewe/|lo|| || wi] = v - w/||e||| wl]. This is cos because £ —a = 8. 
24 Example 6 gives |u; ||U1] < (uf + UŽ) and |u2||U2| < $ (u3 + UZ). The whole line 
becomes .96 < (.6)(.8) + (.8)(.6) < 4(.6? + .87) + $(.8? + .67) = 1. True: .96 < 1. 
28 Three vectors in the plane could make angles > 90° with each other: (1,0), (—1, 4), 


(—1, —4). Four vectors could not do this (360° total angle). How many can do this in 
R? or R”? 


29 Try v = (1,2,—3) and w = (—3,1,2) with cos? = 7% and 6 = 120°. Write 


T4 
v.w =xz+yZz+xyas L(x + y +2)? ia? + y? + 2?). If x + y +z = 0 this 
is —5 (x? + y? + 2°) = —$]lv||||wll. Then v - w/lvlllwl] = —3. 

Problem Set 1.3, page 29 


1 2s1 + 382 + 4s3 = (2,5,9). The same vector b comes from S times x = (2,3,4): 


1 0 0772 (row 1)-x 2 
f 1 o| H = [ewa |= [5]. 
1 i 1)L4 (row 2) +x 9 


2 The solutions are y; = 1, y2 = 0, y3 = 0 (right side = column 1) and yı = 1, y2 = 3, 
y3 = 5. That second example illustrates that the first n odd numbers add to n?. 

4 The combination Ow; + Ow2 + Ow3 always gives the zero vector, but this problem 
looks for other zero combinations (then the vectors are dependent, they lie in a plane): 
wz = (w; + w3)/2 so one combination that gives zero is tw; — Ww + tws. 

5 The rows of the 3 by 3 matrix in Problem 4 must also be dependent: rz = (ri +r3). 
The column and row combinations that produce 0 are the same: this is unusual. 


7 All three rows are perpendicular to the solution x (the three equations rı +x = 0 and 
ro-x = Oandr3-x = 0 tell us this). Then the whole plane of the rows is perpendicular 
to x (the plane is also perpendicular to all multiples cx). 
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9 The cyclic difference matrix C has a line of solutions (in 4 dimensions) to Cx = 0: 


l 0 0 -l Xi 0 c 
=} i A o K = o when x = ; = any constant vector. 
0 0 -i l X4 0 c 


11 The forward differences of the squares are (t + 1)? — t? = ¢24+2t41-17 =2t+1. 
Differences of the nth power are (¢ -+ 1)" — t” = t” — t” + nt”! 4.--. The leading 
term is the derivative nt"—!. The binomial theorem gives all the terms of (t + 1)”. 


12 Centered difference matrices of even size seem to be invertible. Look at eqns. 1 and 4: 


0 1 0 0 X1 by First xy —bz — b4 
—1 0 1 0 xo | _ | be solve x2 |_ by 

0 -l 0 I X3 ~ b3 X2 = bi X3 ~ —b4 

0 0 -1 0 X4 b4 —xX3 = b4 X4 by + b3 


Add equations 1,3,5 

The left side of the sum is zero 

The right side is b} + b3 + bs 

There cannot be a solution unless b; + b3 + bs = 0. 


14 An example is (a,b) = (3,6) and (c,d) = (1,2). The ratios a/c and b/d are equal. 
Then ad = bc. Then (when you divide by bd) the ratios a/b and c/d are equal! 


Problem Set 2.1, page 40 


1 The columns are i = (1,0,0) and j = (0, 1,0) and k = (0,0, 1) and b = (2,3,4) = 
2i +37 + 4k. 
2 The planes are the same: 2x = 4isx = 2,3y = 9is y = 3, and4z = 16is z = 4. The 
solution is the same point X = x. The columns are changed; but same combination. 
4Ifz = 2 then x + y = Oandx — y = z give the point (1,—1,2). If z = 0 then 
x+y = 6and x — y = 4 produce (5, 1,0). Halfway between those is (3, 0, 1). 
6 Equation 1 + equation 2 — equation 3 is now 0 = —4. Line misses plane; no solution. 
8 Four planes in 4-dimensional space normally meet at a point. The solution to Ax = 
(3,3,3,2) is x = (0,0,1,2) if A has columns (1,0,0,0), (1, 1, 0,0), (1, 1, 1,0), 
(1, 1,1, 1). The equations are x + y+z24+¢=3,y+z2+t=3,24+t=3,t =2. 
11 Ax equals (14, 22) and (0, 0) and (9, 7). 
14 2x + 3y +2+5t = 8 is Ax = b with the 1 by 4 matrix A = [2 3 1 5]. The 
solutions x fill a 3D “plane” in 4 dimensions. It could be called a hyperplane. 


0 l 180° rotation from R? = E i| = —]. 


° i — 
16 90° rotation from R = -i 0 0 —1 
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18 


22 


23 
25 
28 


29 
30 


31 


32 


33 


34 


35 


1 0 0 
E= E i| and E = -i l o| subtract the first component from the second. 
0 0 1 


1 1 
x 
The dot product Ax = [1 4 5] | = (1 by 3)(3 by 1) is zero for points (x, y, Z) 
z 


on a plane in three dimensions. The columns of A are one-dimensional vectors. 

A=Í1 2 ; 3 4ļandx =[5 -2]’andb=[1 7]’.r = b-— Axx prints as zero. 
ones(4, 4) x ones(4,1)=[4 4 4 4]’;Bxw=([10 10 10 107. 

The row picture shows four lines in the 2D plane. The column picture is in four- 
dimensional space. No solution unless the right side is a combination of the two columns. 
t7, U7, W7 are all close to (.6, .4). Their components still add to t. 


| 8 5 | = [$] = steady state s. No change when multiplied by E 3} 


8 3 4 5+u 5-u+v 53-9 
M=|1 5 9{=[/5-u-v 5 5+u+v |; M3(1, 1,1) = (15, 15, 15); 
6 7 2 Stu S+tu-v 5-u 


M,(1, 1,1, 1) = G4, 34, 34, 34) because 1 + 2 +--+ + 16 = 136 which is 4(34). 
A is singular when its third column w is a combination cu + dv of the first columns. 


A typical column picture has b outside the plane of u, v, w. A typical row picture has 
the intersection line of two planes parallel to the third plane. Then no solution. 


w = (5,7) is 5u + 7v. Then Aw equals 5 times Au plus 7 times Av. 
2-1 0 O77 Fx 1 xy 4 
—l 2 —-l 0 x2] _ 2 : x2] _ 7 
0-1 2 -1/1x,|=13 has the solution x5 ish 
0 O -1 2] 1x4 4 x4 6 
x = (1,..., 1) gives Sx = sum ofeach row = 1+---+9 = 45 for Sudoku matrices. 


6 row orders (1, 2,3), (1, 3,2), (2, 1,3), 2,3, 1), (3, 1,2), (3,2, 1) are in Section 2.7. 
The same 6 permutations of blocks of rows produce Sudoku matrices, so 64 = 1296 
orders of the 9 rows all stay Sudoku. (And also 1296 permutations of the 9 columns.) 


Problem Set 2.2, page 51 


3 


4 


6 


8 


14 


Subtract —5 (or add 4) times equation 1. The new second equation is 3y = 3. Then 


y=1 and x =5. If the right side changes sign, so does the solution: (x, y) = (—5, —1). 


Subtract £ = £ times equation 1. The new second pivot multiplying y is d — (cb/a) 
or (ad — bc)/a. Then y = (ag — cf )/ (ad — be). 


Singular system if b = 4, because 4x + 8y is 2 times 2x + 4y. Then g = 32 makes 
the lines become the same: infinitely many solutions like (8, 0) and (0, 4). 
If k = 3 elimination must fail: no solution. If k = —3, elimination gives 0 = 0 in 


equation 2: infinitely many solutions. If k = 0 a row exchange is needed: one solution. 


Subtract 2 times row 1 from row 2 to reach (d—10)y—z = 2. Equation (3) is y—z = 3. 
If d = 10 exchange rows 2 and 3. If d = 11 the system becomes singular. 
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15 The second pivot position will contain —2 — b. If b = —2 we exchange with row 3. If 
b = —1 (singular case) the second equation is —y — z = 0. A solution is (1, 1, —1). 


17 If row 1 = row 2, then row 2 is zero after the first step; exchange the zero row with row 
3 and there is no third pivot. If column 2 = column 1, then column 2 has no pivot. 


19 Row 2 becomes 3y — 4z = 5, then row 3 becomes (q + 4)z2 = t — 5. Ifg = —4 
the system is singular — no third pivot. Then if £ = 5 the third equation is 0 = 0. 
Choosing z = 1 the equation 3y — 4z = 5 gives y = 3 and equation | gives x = —9. 


20 Singular if row 3 is a combination of rows 1 and 2. From the end view, the three planes 
form a triangle. This happens if rows 1 +2 =row 3 on the left side but not the right 
side: x+y +z =0, x-—2y—z=1,2x—y =4. No parallel planes but still no solution. 


25 a = 2 (equal columns), a = 4 (equal rows), a = 0 (zero column). 
28 A(2,:) = A(2,:) —3 x A(I,:) will subtract 3 times row 1 from row 2. 


29 Pivots 2 and 3 can be arbitrarily large. I believe their averages are infinite! With row 
exchanges in MATLAB’s lu code, the averages are much more stable (and should be 
predictable, also for randn with normal instead of uniform probability distribution). 


30 If A(5, 5) is 7 not 11, then the last pivot will be 0 not 4. 


31 Row j of U is a combination of rows 1,..., J of A. If Ax = 0 then Ux = 0 (not true 
if b replaces 0). U is the diagonal of A when A is lower triangular. 


Problem Set 2.3, page 63 
100 100 10 07/0 1 0 010 
Ex =|-3 1 o|. za =o 1 o|. =o 0 iif o o= [0 o I| 
001 071 o010jloo1 100 


1 0 0 1 0 0 1 0 9 i 0 0 
3 -4 1 | , o 1 o| , fo 1 o| M = Ez32E31 E2) = -a 1 | . 
0 0 i 2 0 1 0 —2 1 10 -2 |i 


5 Changing a33 from 7 to 11 will change the third pivot from 5 to 9. Changing a33 from 
7 to 2 will change the pivot from 5 to no pivot. 


100 
9 M= l 00 i| After the exchange, we need E31 (not E21) to act on the new row 3. 


1 10 
1 0 1 101 2 0 1 

10 £;3=|0 1 J : fo 1 | ; E31 E13 = É i | . Test on the identity matrix! 
00 1 1 0O 1 1 0 1 


9 8 77 rowsand l1 2 3 
12 The first productis | 6 5 4] alsocolumns The second productis] 0 1 —2 |]. 
3 2 1] reversed. 0 2 -3 


14 E>; has —£.;= E, E39 has —l32 = 2, E43 has —£43 = 2, Otherwise the £’s match Z. 


1 0 0 1 0 0 1 0 0 1 0 0 
18 EF=ja 1 O te=] a 1 o|, p= | 2 1 o|, r= 0 1 o|. 
b c 1 b+ac c 1 2b 0 1 0 3c 1 
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22 (a) Do az;x; (b) a21—a11 (c) 23-2) (d) (E21 Ax)1 = (Ax) = Vay x;. 
25 The last equation becomes 0 = 3. If the original 6 is 3, then row 1 + row 2 = row 3. 
27 (a) No solution if d =0 and c 40 (b) Many solutions if d =0 =c. No effect from a, b. 
28 A= AI = A(BC) = (AB)C = IC = C. That middle equation is crucial. 


i 4 then FEM =|; l 


30 EM = (3 5 2 3 l 


| then EFEM = | J then EEFEM = 


E i = B. So after inverting with E`! = A and F`! = B thisis M = ABAAB. 


Problem Set 2.4, page 75 


2 (a) A (column 3 of B) (b) (Row 1 of A) B (c) (Row 3 of A)(column 4 of B) 
(d) (Row 1 of C)D(column 1 of £). 


5 (a) æ=|o P| ana ar = | val (b) =o olala a 


7 (a) True (b) False (c) True (d) False. 


a a+b 


o4ar=|% c+d 


| and E(AF) = (EA)F: Matrix multiplication is associative. 


00 i 
11 (a) B=4 (b) B=0 (Cc) B= fo 1 0 (d) Every row of B is 1,0, 0. 
1 0 0 


15 (a) mn (use every entry of A) (b) mnp = pxpart (a) (c) n? (n? dot products). 
16 (a) Use only column 2 of B (b) Use only row 2 of A (c)}-(d) Use row 2 of first A. 
18 Diagonal matrix, lower triangular, symmetric, all rows equal. Zero matrix fits all four. 


19 (a) ai (b) b31 = a31/an4 (c) a32 — (3 )ai2 (d) a22 — (3 )ai2. 


fo. 2 nnn fi -17f1 1) _ fo 07. 
22 a= o| as 4 =-1;8C =|] mle i= f6 o|: 


DE = É o] E | = o i] = —ED. You can find more examples. 


24 (A,)" = É da ‘| (A2) = gn-1 i i (A3)” = É ra] 


27 (a) (row 3 of A)» (column 1 of B) and (row 3 of A)» (column 2 of B) are both zero. 


x Ox x x 00x 
(b) | [0 x <J-|0 x l and | [0 0 s-[0 0 + | both upper 
0 0 0 


0 X 00x 
og A times B aj | | I=] 


mmes ALL e e 
30 n29, c=| 72], D= D-cbja=|! || inthet fEA 
»C=! g|,D=|5 3p P-e ja = 1 3 | in the lower corner o . 


0 1l 


32 A times X = [xı x2 x3] will be the identity matrix 7 = [Axı Ax2 Ax3]. 
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3 3 1 0 0 
33 b = H gives x = 3x, + 5x2 + 8x3 = l Ji A= È 1 O| will have 
16 0 -1 I 


those x; = (1, 1,1), x2 = (0,1, 1), x3 = (0,0, 1) as columns of its “inverse” A7!. 
O 10 1 202 0 aba,ada cba,cda These show 
35 A= 1 01 0 A2 = 0 2 0 2j bab,beb dab,dcb 16 2-step 
~ 10 1 0 1l’ 2 0 2 0f’ abc,ade cbe,cde paths in 
10 1 0 02 0 2 bad, bcd dad,dcd the graph 
Problem Set 2.5, page 89 


o i + 0 7 —4 
-i 4 —1— j| 2 -i — 
1A =f: i | and B -|3 r] and C -| 5| 


7 (a) In Ax = (1,0,0), equation 1 + equation 2 — equation 3 is 0 = 1 (b) Right 
sides must satisfy bı +b2 = b3 (c) Row 3 becomes a row of zeros—no third pivot. 


8 (a) The vector x = (1,1, —1) solves Ax = 0 (b) After elimination, columns 1 
and 2 end in zeros. Then so does column 3 = column 1 + 2: no third pivot. 


12 Multiply C = AB on the left by A~! and on the right by C~!. Then A7! = BC7!., 
-1 
14 B7! = A`! f | = A7! |i i: subtract column 2 of A7! from column 1. 


16 | 7 bij d -b| _|ad-—bc 0 The inverse of each matrix is 
c d||—c aj” 0 ad —bc | the other divided by ad — be 


18 A?B = I can also be written as A(AB) = I. Therefore A`! is AB. 
21 Six of the sixteen 0 — 1 matrices are invertible, including all four with three 1’s. 


1 3 1 0 i 3 I 0 1 0 7 -3] _ -11. 
22 È 7 0 >lo 1 -2 |> fo 1 ~2 i= A}; 
1 4 1 0 1 4 1 0 1 0 -3 4/3) _ -1 
|; 9 0 l> fo -3 -3 |> fo 1 ol |= a`] 
labio o l a O 1 0 —b 1 00 1 —a ac—b 
24 fo I c O0 1 o|- [o 1 00 1 “>lo 100 1 “e| 
0 0 100 1 00100 1 0010 0 l 
1 0 0 2 -1 0 
27 Al = -2 1 -| (notice the pattern); AT! = j- 2 -1| 
0 0 1 0-1 1 
1 a 0 —b 
31 Elimination produces the pivots a and a—b anda —b. AT! = ——— | —a a 0). 
a(a—b)| 0-a a 


33 x = (1,1,...,1) has Px = Ox so (P — Q)x = 0. 


I 0 A} 0 -D I 
34 | 78 | pica po | ana T ol; 


35 A can be invertible with diagonal zeros. B is singular because each row adds to zero. 
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38 The three Pascal matrices have P = LU = LL? and then inv(P) = inv(L" )inv(L). 
42 MMT! = (1,—UV) (Int+UUm—-VU)'!V) (this is testing formula 3) 
= J,-UV+U(Im—-VU)!V—UVUUm—VU)~'V (keep simplifying) 
= 1,-UV+UUm—VU)Um—VU)-!V = In (formulas 1, 2, 4 are similar) 
43 4 by 4 still with T}; = 1 has pivots 1, 1, 1, 1; reversing to 7* = UL makes Tj), = 1. 
44 Add the equations Cx = b to find O = bı + b2 + b3 + b4. Same for Fx = b. 


Problem Set 2.6, page 102 


3 £3; = 1 and £32 = 2 (and £33 = 1): reverse steps to get Au = b from Ux = c: 
1 times (x+y +z = 5)+2 times (y+2z = 2)+1 times (z = 2) gives x+3y+6z = I1. 


crf) EEG E ERE) 


1 l 1 1 1 1 0 0 
s fo l \[2 1 la = fo 2 3| =U. mena = |? 1 o| u is 
0-2 1 00 1 0 0-6 0 2 1 


the same as £3} E3ŻU = LU. The multipliers £21, £32 = 2 fall into place in L. 
10 c = 2 leads to zero in the second pivot position: exchange rows and not singular. 
c = | leads to zero in the third pivot position. In this case the matrix is singular. 


_f2 4771 olf2 4] fi o1f2 oli Qian rec yt 
24=|; ilh JIE s|=[ JIE Jif: | |=LDU;UisL 


1 1 4 O 1 1 1 4 0 
| 1 0 —4 4 = k 1 | l —4 | fo 1 -1 |=zoz" 
0-1 i1{|0 0 4 0-1 1 4/10 0 1 


arrr l a r r r a#0 

abs s}_}1 1 b-r s-r s-F b Ær 
4a be t|7]li 1 c—s t-s - Need CES 

abcd 1 1 d-t d#t 
15 1 0J _ 3 _ z 4| 12]. _]-5 

4 1(¢= gives € = |x =]3| sivesx =] 3 |- 


2 4 2 
Ax =b is wr- pi [3 Forward to E =l] 


18 (a) Multiply LDU = LıDıU;ı by inverses to get LĮ!LD = D,U,U~!. The left 
side is lower triangular, the right side is upper triangular > both sides are diagonal. 
(b) L,U, Lı, U, have diagonal 1’s so D = D1. Then LyUIL and UUT! are both J. 

20 A tridiagonal T has 2 nonzeros in the pivot row and only one nonzero below the pivot 
(one operation to find £ and then one for the new pivot!). T = bidiagonal L times 
bidiagonal U. 

23 The 2 by 2 upper submatrix A2 has the first two pivots 5, 9. Reason: Elimination on A 
starts in the upper left comer with elimination on A2. 

24 The upper left blocks all factor at the same time as A: Ag is Ly Ug. 

25 The i, j entry of L™ is j/i fori > j. And L;i—1 is (1 —i)/i below the diagonal 

26 (K7!);; = j(n—i + 1)/(n + 1) fori > j (and symmetric): (n + 1)K—! looks good. 
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Problem Set 2.7, page 115 
2 (AB)! is not ATBT except when AB = BA. Transpose that to find: BTAT = ATBT. 


4 A= o o] has A? = 0. The diagonal of ATA has dot products of columns of A with 


themselves. If ATA = 0, zero dot products = zero columns = A = zero matrix. 


BT DT 
8 The 1 in row 1 has n choices; then the 1 in row 2 has n — 1 choices ...(n! overall). 
10 (3, 1,2, 4) and (2, 3, 1, 4) keep 4 in place; 6 more even P’s keep 1 or 2 or 3 in place; 
(2,1, 4,3) and (3, 4, 1, 2) exchange 2 pairs. (1, 2, 3, 4), (4,3, 2, 1) make 12 even P’s. 
14 Thei, j entry of PAP is then —i+1,n—j +1 entry of A. Diagonal will reverse order. 
18 (a) 5+4+3+2+1 = 15 independent entries if A = AT (b) L has 10 and D has 5; 
total 15in LDLT (c) Zero diagonal if AT = —A, leaving 4+3+2+1 = 10 choices. 


AT ct 
6 MT =| |; M7 = M needs A = A and BT = C and D? = D. 


oo ft 3]-f! o of 37. fa èJ pfi vfs o qfi b 
3 2|5|3 1}{0 -7]/}0 1P |b ce|7jļb 1ij|o ce-+2|lo 1 
2-1 0 l 2 1-4 0 
l- 2 -1| = -4 1 3 1 -2 = LDL". 

_ 2 4 
0 -i 2 0 -3 1 3 l 
1 1 1 0 1771 1 1 2 0 
æ fı la=fe : || i} | a= fi || -i | 
1 23 1 -1 I 2 0 1 l 


ijfo 1 2 1 2 1 
24 pa= Luis] 1 To 3 8 = [o 1 || | awe wai 
1 211 0 1/3 1 —2/3 


1 1 2 1 1 
to exchange and aj. is the pivot, A= L,;P,;U; = |3 1 J f 1 2). 
1/1 0 0 2 


26 One way to decide even vs. odd is to count all pairs that P has in the wrong order. Then 
P is even or odd when that count is even or odd. Hard step: Show that an exchange 
always switches that count! Then 3 or 5 exchanges will leave that count odd. 


1 50 700 

xıl ,.. yr. _| 1 40 2 _ | 6820 | I truck 

31 | 40 1000 x2 = Ax; A y= | 509 1000 50 3 |= | 188000 1 plane 
2 50 3000 


32 Ax - y is the cost of inputs while x - AT y is the value of outputs. 
33 P? = I so three rotations for 360°; P rotates around (1, 1, 1) by 120°. 


36 These are groups: Lower triangular with diagonal 1’s, diagonal invertible D, permuta- 
tions P, orthogonal matrices with QT = Q7!. 


wo — © = 


37 Certainly BT is northwest. B? is a full matrix! B~? is southeast: [11] = [9 _1]. 
The rows of B are in reverse order from a lower triangular L, so B = PL. Then 
BT! = L~'P~! has the columns in reverse order from L~!. So BW! is southeast. 
Northwest B = PL times southeast PU is (PLP)U = upper triangular. 
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38 There are n! permutation matrices of order n. Eventually two powers of P must be 
the same: If P” = P* then PTS = I. Certainly r—s <n! 
P 0 1 0 1 0 
P=|°? is 5 by 5 with P2 = and P3=|]0 0 ljand P§=J. 
| P | [i o| 100 


Problem Set 3.1, page 127 


1xt+yAyptxandx+(yt+2z)4(e4+ y)4+Zand (cı +¢2)x Æ cix + cox. 


3 (a) cx may not be in our set: not closed under multiplication. Also no 0 and no —x 
(b) c(x + y) is the usual (xy)°, while cx + c y is the usual (x°)(°). Those are equal. 
With c = 3, x = 2, y = I this is 3(2 + 1) = 8. The zero vector is the number 1. 


5 (a) One possibility: The matrices cA form a subspace not containing B (b) Yes: the 
subspace must contain A — B = I (c) Matrices whose main diagonal is all zero. 


9 (a) The vectors with integer components allow addition, but not multiplication by 5 
(b) Remove the x axis from the xy plane (but leave the origin). Multiplication by any 
c is allowed but not all vector additions. 


o ol (b) All matrices E d (c) All diagonal matrices. 


15 (a) Two planes through (0, 0, 0) probably intersect in a line through (0, 0, 0) 
(b) The plane and line probably intersect in the point (0, 0, 0) 
(c) If x and y are in both $ and T, x + y and cx are in both subspaces. 


20 (a) Solution only if b2 = 2b; and b3 = —b, (b) Solution only if b3 = —b,. 


23 The extra column b enlarges the column space unless b is already in the column space. 
[A b]= 1 0 1| Garger column space) 1 O 1] @ isin column space) 
~ 10 0 1| M@osolutionto Ax =b)|O 1 11] (Ax =b hasa solution) 


25 The solution to Az = b + b* is z = x + y. If b and b* are in C (A) so is b + b*. 


30 (a) If u and v are both in S + T, then u = sı + tı and v = s2 + t2. SO U + v = 
(Sı +52) + (tı + t2) is also in S + T. And so is cu = cs; + ct 1: a subspace. 


(b) If S and T are different lines, then $ U T is just the two lines (not a subspace) but 
S + T is the whole plane that they span. 


31 If S = C(A) and T = C (B) then S + T is the column space of M = [A _ B]. 


32 The columns of AB are combinations of the columns of A. So all columnsof [A AB] 


11 (a) All matrices | 


are already in C(A). But A = [o 1 has a larger column space than A? = [o o|; 


For square matrices, the column space is R” when A is invertible. 


Problem Set 3.2, page 140 


2 (a) Free variables x2, x4, xs and solutions (—2, 1,0, 0,0), (0,0, —2, 1, 0), (0,0, —3, 0, 1) 
(b) Free variable x3: solution (1, —1, 1). Special solution for each free variable. 
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12 0 0 0 1 0 -i 
4 R= fo 0 1 2 | R= fo 1 1 |, R has the same nullspace as U and A. 
00000 0 O0 0 


6 (a) Special solutions (3, 1,0) and (5,0,1) (b) (3,1,0). Total of pivot and free is n. 


1-3-5], fi -3 O1..,, mo 
ex=|( 0 5 | with =k R=| 4 0 1 | with = fo | 


10 (a) Impossible row 1 (b) A = invertible (c) A=allones (d) A=2I,R= Í. 
14 If column 1 = column 5 then x5 is a free variable. Its special solution is (—1, 0,0,0, 1}. 


16 The nullspace contains only x = 0 when A has 5 pivots. Also the column space is RŽ, 
because we can solve Ax = b and every B is in the column space. 


20 Column 5 is sure to have no pivot since it is a combination of earlier columns. With 
4 pivots in the other columns, the special solution is s = (1,0, 1,0, 1). The nullspace 
contains all multiples of this vector s (a line in R5). 


24 This construction is impossible: 2 pivot columns and 2 free variables, only 3 columns. 


2% A= foo has N (A) =C (A) and also (a)(b)(c) are all false. Notice rref(A') = loo: 
30 


32 Any zero rows come after these rows: R = [1 —2 —3], R= o A ol; R=]. 


33 (a) lo ilo J ò a} È al; o o (b) All 8 matrices are R’s! 


35 The nullspace of B = [A A] contains all vectors x = | | for y in R4. 


36 If Cx = 0 then Ax = 0 and Bx = 0. So N(C) = N(A)N N(B) = intersection. 


37 Currents: yı — y3 + Y4 = —y1 + y2 + +ys = —y2 + y4 + Ye = —Yy4 — y5 — Yo = Ù. 
These equations add to 0 = 0. Free variables y3, y5, yẹ: watch for flows around loops. 


Problem Set 3.3, page 151 


1 (a) and (c) are correct; (d) is false because R might have 1’s in nonpivot columns. 


1 2 0 R 0 Zero rows go 
a ra=lo 0 1| Rg =|R4 Ra] Re —» | if ul to the bottom 


5 I think Rj = Aj, R2 = Az is true. But Ry — R2 may have —1’s in some pivots. 

7 Special solutions in N =[-2 —4 1 0;-3 -5 0 1] and [1 0 0;0 —2 1]. 
13 P has rankr (the same as A) because elimination produces the same pivot columns. 
14 The rank of RT is also r. The example matrix A has rank 2 with invertible S: 


1 3 
_ ¢_|l 2 2 7_|1l 2 _ {1 
esh sf mki] SBI sE 


16 (uv) (wz) = u(v'w)z" has rank one unless the inner product is v'w = 0. 


baa MeS 
Ld 
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18 If we know that rank(B™ A") < rank(A’), then since rank stays the same for transposes, 
(apologies that this fact is not yet proved), we have rank(AB) < rank(A). 


20 Certainly A and B have at most rank 2. Then their product AB has at most rank 2. 
Since BA is 3 by 3, it cannot be J even if AB = I. 


21 (a) A and B will both have the same nullspace and row space as the R they share. 
(b) A equals an invertible matrix times B, when they share the same R. A key fact! 


1 0 1 10 1 1 0 
22 A = (pivot columns)(nonzero rows of R) = | 1 J | | = f 1 0| + 
1 8 


0 0 1 1 10 
5 Oo S| paf2 211 0] = columns _ f2 0], fo 2 
" =j2 3/10 1| 7 timesrows ~|2 olto 3 
0 0 8 
26 The m by n matrix Z has r ones to start its main diagonal. Otherwise Z is all zeros. 


|i Fi rbyr rbyn-—r |. n IZO]. Tp\ 
27 R= ol n byr mr n y firre =[6 o |; rret(R R)=same R 


28 The row-column reduced echelon form is always E of I isr byr. 


Problem Set 3.4, page 163 


2 1 3 by 2 1 3 b 1 1/2 3/2 5 
ar 3 9 ma] > fo 0 0 pa—an | Then [ R ai= fo 0 0 J 
4 2 6 b 0 0 0 b3—2b, 00 0 9 
Ax = b has a solution when b2 — 3b, = 0 and b3 — 2b; = 0; C(A) = line through 
(2,6, 4) which is the intersection of the planes b2 — 3b; = O and b3 — 2b; = 0; 
the nullspace contains all combinations of sı = (—1/2,1,0) and s2 = (—3/2,0, 1); 
particular solution x p = d = (5,0,0) and complete solution x p + c1S1 + C282. 


= xp + Xn = (5,0, 5,0) + x2(-3, 1,0,0) + x4(0, 0, —2, 1). 


6 (a) Solvable if bz = 2b; and 3b, — 3b3 + b4 = 0. Then x = oe | =x, 


4 ¥ complete 


5b; — 2b3 —] 
(b) Solvable if b2 = 2b) and 3b, — 3b3 + ba =0x = | b3 — 2b, l + x3 Hi 
0 l 


8 (a) Every b is in C (A): independent rows, only the zero combination gives 9. 
(b) Need bz = 252, because (row 3) — 2(row 2) = 0. 


12 (a) xı — x2 and 0 solve Ax = 0 (b) A(Q2x, — 2x2) = 0, A(QQx1 — x2) = b 

13 (a) The particular solution xp is always multiplied by 1 (b) Any solution can be xp 
(c) É ; d = pi Then H is shorter (length ¥/2) than f] (length 2) 
(d) The only “homogeneous” solution in the nullspace is x, = 0 when A is invertible. 


14 If column 5 has no pivot, x5 is a free variable. The zero vector is not the only solution 
to Ax = Q. If this system Ax = b has a solution, it has infinitely many solutions. 
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16 The largest rank is 3. Then there is a pivot in every row. The solution always exists. 
The column space is R?. An example is A = [7 F ] for any 3 by 2 matrix F. 


18 Rank = 2; rank = 3 unless g = 2 (then rank = 2). Transpose has the same rank! 
25 (a) r<m,alwaysr<n (b)r=mr<n (c)r<mr=n (d)r=m=n. 


og [1 23 0]_,[1 2 0 01, [Z]. 235] fi 20-4 
0040 00105] of [0048 001 2 


Free x2 = 0 gives x p = (—1, 0, 2) because the pivot columns contain T. 


1023 2 10232 1020 -4 3 -7 
0 |1320 s> [o3 0-33 —>|0 100 Ji 9 [xn =%3] ] 
2049 10 000 36 o001 2} ] 5 l 


36 If Ax = b and Cx = b have the same solutions, A and C have the same shape and 
the same nullspace (take b = 0). If b = column 1 of A, x = (1,0,...,0) solves 
Ax =b so it solves Cx =b. Then A and C share column 1. Other columns too: A =C! 


Problem Set 3.5, page 178 


2 v1, U2, V3 are independent (the —1’s are in different positions). All six vectors are on 
the plane (1, 1,1, 1) -v = 0 so no four of these six vectors can be independent. 


3 Ifa = 0 then column 1 = 0; if d = 0 then b(column 1) — a(column 2) = 0; if f =0 
then all columns end in zero (they are all in the xy plane, they must be dependent). 


6 Columns 1, 2, 4 are independent. Also 1, 3, 4 and 2, 3, 4 and others (but not 1, 2, 3). 
Same column numbers (not same columns!) for A. 


8 If cy(w2 +w3)+c(wı +w3)+c3(wy + w2) = 0 then (c2 +c3)w1 + (c1 + c3)w2 + 
(cı +c2)w3 = 0. Since the w’s are independent, cz + c3 = cy +¢3 = Cy + C2 = 0. 
The only solution is cı = cz = c3 = 0. Only this combination of v1, v2, v3 gives 0. 


11 (a) Line in R? (b) Plane in R (c) All of RÊ (d) All of RÊ. 


12 b is in the column space when Ax = b has a solution; ¢ is in the row space when 
ATy = c has a solution. False. The zero vector is always in the row space. 


15 The n independent vectors span a space of dimension n. They are a basis for that space. 
If they are the columns of A then m is not less than n (m > n). 


18 (a) The 6 vectors might not span Rt (b) The 6 vectors are not independent 
(c) Any four might be a basis. 


20 One basis is (2,1,0), (—3,0, 1). A basis for the intersection with the xy plane is 
(2, 1,0). The normal vector (1, —2, 3) is a basis for the line perpendicular to the plane. 


22 (a) True (b) False because the basis vectors for R® might not be in S. 
25 Rank 2 ifc = Oand d = 2; rank 2 except whenc = d orc = —d. 


og | 1 9 O] fo 1 07 fo o 17 f1 -1 Off a 0- 
-1 0 oplo -1 of Jo o afl- 1 of @ |- o 1 UT 


32 y(0) = Orequires A+ B + C = 0. One basis is cos x — cos 2x and cos x — cos 3x. 
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34 y1(x), yo(x), y3(x) can be x, 2x, 3x (dim 1) or x, 2x, x? (dim 2) or x, x”, x3 (dim 3). 
37 The subspace of matrices that have AS = SA has dimension three. 


39 If the 5 by 5 matrix [A 5] is invertible, b is not a combination of the columns of A. 
If [A 65] is singular, and the 4 columns of A are independent, b is a combination of 
those columns. In this case Ax = b has a solution. 


1 1 1 ji 1 oo, 
41z7=)1 — 1| + 1 + il-lı l The six P’s l 
1 1 1 1 1 are dependent 


42 The dimension of S is (a) zero when x = 0 (b) one when x = (1,1, 1,1) 
(c) three when x = (1, 1,—1, —1) because all rearrangements have x; +---+x4 =0 
(d) four when the x’s are not equal and don’t add to zero. No x gives dimS = 2. 


43 The problem is to show that the w’s, v’s, w’s together are independent. We know the 
u’s and v’s together are a basis for V, and the w’s and w’s together are a basis for W. 
Suppose a combination of u’s, v’s, w’s gives 0. To be proved: All coefficients = zero. 
Key idea: The part x from the u’s and v’s is in V, so the part from the w’s is —x. This 
part is now in V and also in W. But if —x isin V N W itis a combination of w’s only. 
Now x — x = Ouses only u’s and v’s (independent in V!) so all coefficients of u’s and 
v’s must be zero. Then x = 0 and the coefficients of the w’s are also zero. 


44 The inputs to an m by n matrix fill R”. The outputs (column space!) have dimension 
r. The nullspace has n — r special solutions. The formula becomes r + (n —r) =n. 


Problem Set 3.6, page 190 


1 (a) Row and column space dimensions = 5, nullspace dimension = 4, dim(N(A‘)) 
=2 sum=16=m+n_ (b) Column space is R’; left nullspace contains only 0. 


l 


(e) Impossible Row space = column space requires m = n. Then m —r = n -rf; 
nullspaces have the same dimension. Section 4.1 will prove N(A) and N (AT) 
orthogonal to the row and column spaces respectively—here those are the same space. 


6 A: dim 2,2,2,1: Rows (0,3,3,3) and (0,1,0,1); columns (3,0,1) and (3,0,0); 
nullspace (1,0,0,0) and (0, —1,0, 1); N(AT) (0,1,0). B: dim 1,1,0,2 Row space 
(1), column space (1, 4, 5), nullspace: empty basis, N (AT) (—4, 1,0) and (—5, 0, 1). 

9 (a) Same row space and nullspace. So rank (dimension of row space) is the same 
(b) Same column space and left nullspace. Same rank (dimension of column space). 


1 0 
4 (a) f J (b) Impossible: r+(n—r)mustbe3 (œ) [1 1] (d) E 7 
0 1 


11 (a) No solution means that r < m. Always r < n. Can’t compare m and n 
(b) Since m — r > 0, the left nullspace must contain a nonzero vector. 


1 0 1 0 
not match 2 + 2 = 4. Only v = 0 is in both N (A) and C (AT). 
16 If Av = 0 and v is a row of A thenv-v = 0. 


1 1 101 2 2 1 
12 A neat choice is fo 2 |i 7 ol = 2 4 o| r+(n—r) =n = 3 does 
1 
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18 Row 3—2 row 2+ row 1 = zero row so the vectors c(1, —2, 1) are in the left nullspace. 
The same vectors happen to be in the nullspace (an accident for this matrix). 


20 (a) Special solutions (—1, 2,0,0) and (—1, 0, —3, 1) are perpendicular to the rows of 
R (and then ER). (b) ATy = 0 has 1 independent solution = last row of E7!. 
(E~!A = R has a zero row, which is just the transpose of ATy = 0). 


21 (a) u and w (b) v and z (c) rank < 2 if u and w are dependent or if v and z 
are dependent (d) The rank of uv! + wz? is 2. 


24 A'y = d puts d in the row space of A; unique solution if the left nullspace (nullspace 
of AT) contains only y = 0. 


26 The rows of C = AB are combinations of the rows of B. So rank C < rank B. Also 
rank C < rank A, because the columns of C are combinations of the columns of A. 


29 ai, = 1,aı2 = 0,813 = l, &22 = 0, a32 = 1,a31 = 0, a23 = 1,433 = 0, a21 = 1. 


30 The subspaces for A = uv” are pairs of orthogonal lines (v and vt, u and ut). 


If B has those same four subspaces then B = cA with c # 0. 


31 (a) AX = 0 if each column of X is a multiple of (1,1, 1); dim(nullspace) = 3. 
(b) If AX = B then all columns of B add to zero; dimension of the B’s = 6. 
(c) 3 + 6 = dim(M>*?) = 9 entries in a 3 by 3 matrix. 


32 The key is equal row spaces. First row of A = combination of the rows of B: only 
possible combination (notice J) is 1 (row 1 of B). Same for each row so F = G. 


Problem Set 4.1, page 202 


1 Both nullspace vectors are orthogonal to the row space vector in R>. The column space 
is perpendicular to the nullspace of AT (two lines in R? because rank = 1). 


1 2 -3 2 1 1 1 
3 (a) l 2 —3 1 (b) Impossible, -= not orthogonal to p (c) p and [o] in 
—3 5 —2 5 1 1 0 


C (A) and N (A7) is impossible: not perpendicular (d) Need A? = 0; take A = [1 z} | 
(e) (1,1, 1) in the nullspace (columns add to 0) and also row space; no such matrix. 

6 Multiply the equation’ by yi, Y2, y3 = 1, 1,—1. Equations add to 0 = 1 so no solution: 
y = (1, 1, —1) is in the left nullspace. Ax = b would need 0 = (y'A)x = y'b = 1. 

8 x = x, + Xn, where x, is in the row space and x» is in the nullspace. Then Ax, = 0 
and Ax = Ax; + Ax, = Ax,. All Ax are in C (A). 

9 Ax is always in the column space of A. If ATAx = 0 then Ax is also in the nullspace 
of AT. So Ax is perpendicular to itself. Conclusion: Ax = 0 if ATAx = 0. 


10 (a) With AT = A, the column and row spaces are the same (b) x is in the nullspace 
and z is in the column space = row space: so these “eigenvectors” have x'z = 0. 

12 x splits into x, +x, = (1,-1) + (1,1) = (2,0). Notice N (AD) is a plane (1,0) = 
(1,1)/2 + (1,—1)/2 =x, + xp. 

13 VTW = zero makes each basis vector for V orthogonal to each basis vector for W. 
Then every v in V is orthogonal to every w in W (combinations of the basis vectors). 
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14 Ax = BX means that [A B 1] z| = 0. Three homogeneous equations in four 


unknowns always have a nonzero solution. Here x = (3,1) and ¥ = (1,0) and 
Ax = BX = (5,6, 5) is in both column spaces. Two planes in R? must share a line. 


16 ATy = 0 leads to (Ax)'y = x™ATy = 0. Then y L Ax and N (AT) L C(A). 


18 S+ is the nullspace of A = E > >}: Therefore S+ is a subspace even if S is not. 


21 For example (—5, 0, 1, 1) and (0, 1, —1, 0) span S+ =nullspace of A = li 7 ; a| 


23 x in V+ is perpendicular to any vector in V. Since V contains all the vectors in S, 
x is also perpendicular to any vector in S. So every x in V+ is also in S+. 


28 (a) (1,—1,0) is in both planes. Normal vectors are perpendicular, but planes still in- 
tersect! (b) Need three orthogonal vectors to span the whole orthogonal complement. 
(c) Lines can meet at the zero vector without being orthogonal. 


30 When AB = 0, the column space of B is contained in the nullspace of A. Therefore 
the dimension of C (B) < dimension of N (A). This means rank(B) < 4 — rank(A). 


31 null(N’) produces a basis for the row space of A (perpendicular to N(A)). 
32 Weneedr'™n = Oandc™é = 0. All possible examples have the form acer’ witha Æ 0. 


33 Both r’s orthogonal to both n’s, both e’s orthogonal to both £’s, each pair independent. 
All A’s with these subspaces have the form [c1 ¢2]}M[r1 r2]! fora 2 by 2 invertible M. 


Problem Set 4.2, page 214 


1 (a) a’b/ata=5/3; p=5a/3; e =(—2, 1, 1)/3 (b) a’b/a'a=—1; p=a;e=0. 


iyi 11 ifs fi 3 1 1 
3P,=-]1 1 1landPjb=- slri 3 9 3land Pob =| 31. 
3/1 1 1 315 liji 3 4 j 


6 py =(5.—§.—§) and po =(§, §.—§) and ps = ($, —§. $). So py + P2 + P3 =b. 
9 Since A is invertible, P = A(A'A)~“!1A™=AA7!(A™)! AT = 1: project on all of R?. 
11 (a) p=A(ATA)“! ATH = (2, 3,0), e = (0,0, 4), ATe =0 (b) p=(4,4,6), e =0. 
15 2A has the same column space as A. ¥ for 2A is half of £ for A. 
16 4(1,2,-1) + 20,0, 1) = (2, 1, 1). So b is in the plane. Projection shows Pb = b. 


18 (a) J — P is the projection matrix onto (1, —1) in the perpendicular direction to (1, 1) 
(b) J — P projects onto the plane x + y + z = 0 perpendicular to (1, 1, 1). 


l 1/6 -1/6 —1/3 5/6 1/6 1/3 
20 e=- o= = -1/6 1/6 1/3 | 1-9=|116 5/6 “1/3 
—2 —1/3 1/3 2/3 1/3 —1/3 1/3 


21 (A(ATAJ AT)? = A(A"A)!(ATAX{ATA) 1AT = A(ATA)!AT. So P? = P. 
P b is in the column space (where P projects). Then its projection P(P b) is Pb. 
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24 The nullspace of AT is orthogonal to the column space C (A). So if ATb = 0, the pro- 
jection of b onto C (A) should be p = 0. Check Pb = A(A™A)71ATb = A(A™A)“'0. 

28 P? = P = PT give PTP = P. Then the (2, 2) entry of P equals the (2, 2) entry of 
PTP which is the length squared of column 2. 

29 A = BT has independent columns, so ATA (which is BBT) must be invertible. 


aa 25|12 25 

(b) The row space is the line through v = (1,2,2) and Pr = vv'/v'v. Always 
Pc A = A (columns of A project to themselves) and APr = A. Then Pc APR = A! 

31 The error e = b — p must be perpendicular to all the a’s. 

32 Since P,b is in C(A), Po(P1b) equals Pyb. So PoP; = Pı = aa'/a'a where 
a = (1,2,0). 

33 If Pı P2 = P2P; then S is contained in T or T is contained in S. 

34 BB" is invertible as in Problem 29. Then (A'A)(BB") = product of r by r invertible 


matrices, so rank r. AB can’t have rank < r, since AT and BT cannot increase the rank. 
Conclusion: A (m by r of rank r) times B (r by n of rank r) produces AB of rank r. 


T 1 
30 (a) The column space is the line through a = H so Pe = 2a | 9 Hi 


Problem Set 4.3, page 226 


1 0 0 
|1 1I _ | 8 tT, 14 8 T 36 
1Az= 1 3 and b = g give Á a= AERA 
1 4 20 


-1 
~ ae 1 a 5 3 
ATAF = ATb gives ¥ = H and p = AF = | j3 | ande =b-p=|_; 
5 E = (C—0)?4+(C —8)?+(C —8)?+(C —20)?. AT =[1 1 1 l]and ATA = [4]. 
A'b = [36] and (A! A)~! Alb = 9 = best height C. Errors e = (—9, —1, —1, 11). 
7A=[0 1 3 4]°, ATA = [26] and ATS = [112]. Best D = 112/26 = 56/13. 


8 ¥=56/13, p= (56/13)(0, 1,3, 4). (C, D) = (9, 56/13) don’t match (C, D) = (1, 4). 
Columns of A were not perpendicular so we can’t project separately to find C and D. 


Parabola l ° N C s 4 8 27rC 36 
9 Projecth |, 3 9 [|= g araa=| 8 26 92|] Dj=] 112]. 
4Dto3D |; 3 ZILI |.6 26 92 38| E] |400 


11 (a) The best line x = 1 + 4t gives the center point b = 9 when? = 2. 
(b) The first equation Cm + D } t; = }_ b; divided by m gives C + Dt =b. 

13 (A™A)—!Al(b — Ax) = £ — x. When e = b — Ax averages to 0, so does X — x. 

14 The matrix (£ — x)(¥ — x)" is (ATA)~! AT(b — Ax)(b — Ax)TA(ATAY 1. When the 
average of (b — Ax)(b — Ax)! is o7J, the average of (x ~ x)(x — x)" will be the 
output covariance matrix (ATAJ! A'a?A(A™A)~! which simplifies to 0?(A™A)“}. 
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oo i o_o a . 
6 Tg + ats = 79?! +- + bio). Knowing Xo avoids adding all b’s. 


18 p = Ax = (5,13,17) gives the heights of the closest line. The error is b — p = 
(2, —6, 4). This error e has Pe = Ph- Pp = p—p=0. 


21 eisinN(A'); p isin C(A); X isin C(A); N(A) = {0} = zero vector only. 


23 The square of the distance between points on two lines is E = (y — x)? + (3y = x)? + 
(1 + x). Derivatives S0E/dx = 3x — 4y +1 = O and 3E /ðy = —4x + 10y = ° 


The solution is x = —5/7, y = —2/7; E = 2/7, and the minimum distance is /2/ 

25 3 points ona line: Equal slopes (b2—b1)/(t2—t1) = (b3—b2)/(t3—t2). Linear han 
Orthogonal to (1, 1, 1) and (£1, f2, f3) is y = (f2—t3, t3—t1, f1 —t2) in the left nullspace. 
b is in the column space. Then yTb = 0 is the same equal slopes condition written as 
(b2 — bi )(t3 — t2) = (b3 — b2)(t2 — t1). 

27 The shortest link connecting two lines in space is perpendicular to those lines. 

28 Only 1 plane contains 0, a1, a2 unless @;, a2 are dependent. Same test for a),..., an. 


Problem Set 4.4, page 239 


3 (a) ATA will be 167 (b) ATA will be diagonal with entries 1, 4, 9. 
6 Q1Q> is orthogonal because (0102)'Q102 = 03070102 = 0302 =I. 
8 If g, and 4, are orthonormal vectors in R° then (q1b)q, + (qZb)q is closest to b. 


11 (a) Two orthonormal vectors are q} = 75 (1,3,4,5,7) and q> = 5 (-7,3,4,-5, 1) 
(b) Closest in the plane: project Q O7(1,0,0,0,0) = (0.5, —0.18, —0.24, 0.4, 0). 


13 The multiple to subtract is ab. Then B = b — dba = = (4,0) — 2. (1, 1) = (2, —2). 


1 4 lal tb] [i/v2  1/V2][ V2 272 

1 = = . 

4 [i 4 [a alf o BILA -All 0 2v3|= 2 

15 (a) gq, = 4(1,2,—2), q2 = (2,1,2), q3 = į, —2,—1) (b) The nullspace 
of AT contains q, (c) ¥ = (ATA) AT(L, 2,7) = (1,2). 

16 The projection p = (a'b/ata)a = 14a/49 = 2a/7 is closest to b; q; = a/|lal| = 
a/7is (4,5,2,2)/7. B = b — p = (—1, 4, —4, —4)/7 has ||B|| = 1 so q, = B. 

18 A =a = (1,—1,0,0); B = b-p = (4, 4,—1,0);C =c—py—pz = Gog. 371). 


Notice the pattern in those orthogonal A, B,C. In Rf, D would be (4,4 I H, 1 — 1). 


20 (a) True (b) True. Qx = x19, + x242. || Qx ||? = x? + x2 because q; +q = 0. 

21 The orthonormal vectors are q4 = (1,1, 1, 1)/2 and q, = (—5, —1, 1, 5)/ v52. Then 
b = (—4, —3, 3, 0) projects to p = (—7, —3, —1, 3)/2. And b— p = (—1, —3, 7, —3)/2 
is orthogonal to both qı and q3. 


22 A= (1,1,2), B = (1,—1,0), C = Fi —1, 1). These are not yet unit vectors. 


T 
26 (qiC*)q. = EEB because q3 = Tal Al and the extra q; in C* is orthogonal to q3. 


28 There are mn muitiplications in (11) and im?n multiplications in each part of (12). 
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30 The wavelet matrix W has orthonormal columns. Notice W~! = WT in Section 7.3. 


1 0 0 
32 Qı = [o 4 reflects across x axis, Q2 = fo 0 -1 across plane y +z = 0. 
0 -i 0 


33 Orthogonal and lower triangular => +1 on the main diagonal and zeros elsewhere. 


Problem Set 5.1, page 251 


1 det(2A) = 8; det(—A) = (—1)* det A = $; det(A?) = 4; det(A7!) = 2 = det(AT)“!. 
5 |Js5|=1, |J6]=—1, |/7| =—1. Determinants 1, 1,—1,—1 repeat so |Jj9;|=1. 
8 QTQ = I > |Q}? =1 = |Q| = +1; Q” stays orthogonal so det can’t blow up. 


10 If the entries in every row add to zero, then (1, 1,..., 1) is in the nullspace: singular 
A has det = 0. (The columns add to the zero column so they are linearly dependent.) 
If every row adds to one, then rows of A — J add to zero (not necessarily det A = 1). 


11 CD = -DC = detCD = (—1)" det DC and not — det DC. If n is even we can have 
an invertible CD. 


14 det(A) = 36 and the 4 by 4 second difference matrix has det = 5, 
15 The first determinant is 0, the second is 1 — 21? + t4 = (1 —17)?. 


17 Any 3 by 3 skew-symmetric K has det(K") = det(—K) = (—1)3det(K). This is 
—det( K). But always det( KT) = det(K), so we must have det(K) = 0 for 3 by 3. 


21 Rules 5 and 3 give Rule 2. (Since Rules 4 and 3 give 5, they also give Rule 2.) 


18 7 _ 1 3 =i 
23 det(A) = 10, A? = 14 Yi det(A*) = 100, A`! = to | il has det is: 


det(A — AI) = A? — 74 + 10 = 0 when A = 2 or A = 5; those are eigenvalues. 
27 det A = abc, det B = —abcd, detC = a(b —a)(c — b) by doing elimination. 


32 Typical determinants of rand(n) are 10°, 107°, 107°, 107!8 for n = 50, 100, 200, 400. 
randn(n) with normal distribution gives 103!, 1078, 10!8°, Inf which means > 2104, 
MATLAB allows 1.999999999999999 x 21023 ~ 1.8 x 1038 but one more 9 gives Inf! 


Problem Set 5.2, page 263 


2 det A = —2, independent; det B = 0, dependent; det C = —1, independent. 

4 411023032044 gives —1, because 2 <> 3, 41442343244) gives +1, det A = 1 — 1 = 0; 
det B =2+-4-4-2-—1-4-4-1 = 64-16 = 48. 

6 (a) If ay, = a22 = a33 = 0 then 4 terms are sure zeros (b) 15 terms must be zero. 


8 Some term @1¢42g «+: @nw in the big formula is not zero! Move rows 1, 2, .. ., n into 
rows a, B,..., @. Then these nonzero a’s will be on the main diagonal. 


9 To get +1 for the even permutations the matrix needs an even number of —1’s. For the 
odd P’s the matrix needs an odd number of —1’s. So six 1’s and det = 6 are impossible 
five 1’s and one —1 will give AC = (ad — bc)I = (det A)/ max(det) = 4. 
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wc=ul? ~l pe 0 eo -2 det B = 1(0) + 2(42) + 3(—35) = —21. 
Sie af “| 3 “g 3| Puzzle: det D = 441 = (—21). Why? 


3 2 1 4 0 0 
ae=|? 4 J and ACT =|0 4 0|. Therefore A7! = ¿CT = CT/ det A. 


1 2 3 0 0 4 
13 (a) Ci = 0, C = —1, C3 = 0, C4 = 1 (b) C, = —C,—2 by cofactors of row 
1 then cofactors of column 1. Therefore Cio = —Cg = Cg = —C4 = Cp = I. 


15 The 1,1 cofactor of the n by n matrix is En—1. The 1,2 cofactor has a single 1 in its 
first column, with cofactor E,—2: sign gives —E,-2. So En = En-1 — En-2. Then E; 
to Eg is 1,0, —1, —1, 0, 1 and this cycle of six will repeat: E100 = E4 = —1. 


16 The 1,1 cofactor of the n by n matrix is F,-,. The 1,2 cofactor has a 1 in column 
1, with cofactor F,~2. Multiply by (—1)!*? and also (—1) from the 1, 2 entry to find 
Fa = Fr—1 + Fn—2 (so these determinants are Fibonacci numbers). 


19 Since x, x”, x? are all in the same row, they are never multiplied in det V4. The deter- 


minant is zero at x = a or b orc, so det V has factors (x — a)(x — b)(x —c). Multiply 
by the cofactor V3. The Vandermonde matrix V;; = (x;)/—! is for fitting a polynomial 
p(x) = b at the points x;. It has det V = product of all x, — x» for k > m. 


20 G2 = —1, G3 = 2, G4 = —3, and Gy, = (-1)""!(n — 1) = (product of the A’s ). 
24 (a) All L’s have det = 1; detU, = det Ag = 2,6,—6 (b) Pivots 5, 6/5, 7/6. 


lca "| 3 | = |A| times |D—CA~B| 
which is |AD — ACA”! B|. If AC = CA this is |AD — CAAT! B| = det(AD — CB). 
27 (a) det A = ay1C\, +--- + GinCin. Derivative with respect to a1, = cofactor C11. 


25 Problem 23 gives det = ] and det é 


29 There are five nonzero products, all 1’s with a plus or minus sign. Here are the (row, 
column) numbers and the signs: + (1, 1)(2, 2)(3, 3)(4,4) + (1,2)(2, DG, 4)(4,3) — 
(1, 2)(2, 1)3,3)(4.4) — (1, D@,2)3,494,3) — C.D, 3)(3,2)(4, 4). Total —1. 


32 The problem is to show that Fon42 = 3Fzn — Fon—2. Keep using Fibonacci’s rule: 
Fon42 = Fan+1 + Foy = Fon + Fon-1 + Fon =2Fon + (Fon — Fon—2) =3 Fon — Fon-2. 


33 The difference from 20 to 19 multiplies its 3 by 3 cofactor = 1: then det drops by 1. 


34 (a) The last three rows must be dependent (b) In each of the 120 terms: Choices 
from the last 3 rows must use 3 columns; at least one of those choices will be zero. 


Problem Set 5.3, page 278 
2 (a) y= ally ab|—c/(ad—be) (b) y = det By/ det A = (fg — id)/D. 
3 (a) x, = 3/0 and x2 = —2/0: no solution (b) xı = x2 = 0/0: undetermined. 


4 (a) xı = det([b az a3])/detA, if detA Æ 0 (b) The determinant is linear in 
its first column so x1 |&1 &2 a3|+x2la2 az a3|+x3|@3 a2 a3|. The last two determinants 
are zero because of repeated columns, leaving xı l|aı @2 a3| which is x, det A. 
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1-2 0 if? 21 a 
6(a)|0 1 0 w -12 4 2l. An invertible symmetric matrix 
3 4 has a symmetric inverse. 
0 -7 1 1 2 3 
6 -3 0 3 0 07 Thisis (det A)J and det A = 3. 
8 C= | 3 1 -1 ansact=| 9 3 o|; The 1, 3 cofactor of A is 0. 
-6 2 i 0 0O 3] Multiplying by 4 or 100: no change. 


9 If we know the cofactors and det A = 1, then CT = AT! and also det AT} = 1. 
Now A is the inverse of CT, so A can be found from the cofactor matrix for C. 


11 The cofactors of A are integers. Division by det A = +1 gives integer entries in A7. 


15 Forn = 5, C contains 25 cofactors and each 4 by 4 cofactor has 24 terms. Each term 
needs 3 multiplications: total 1800 multiplications vs.125 for Gauss-Jordan. 


J311] Area of faces lig k|_ —2i-2j7 +8k 
17 Volume=|i31 = 20. length of cross product ~ |? 41 |~ length=6/2 
18 (a) Area 4|341| =5 (b) 5 + new triangle area 4 Siil =5+7=12 

2jðsi| 8 21-101] _ OT 


21 The maximum volume is Lı L2 L3L4 reached when the edges are orthogonal in R4. 
With entries 1 and —1 all lengths are V4 = 2. The maximum determinant is 24 = 16, 
achieved in Problem 20. For a 3 by 3 matrix, det A = (4/3)? can’t be achieved. 


aT aa 0 0 
det ATA = ((la||[lI[llell)? 
23 ATA= |b" |[a b c]=| 0 b™ O jh 
A [a b c] o P ete | dea = allblici 


25 The n-dimensional cube has 2” corners, n2”! edges and 27 (n—1)-dimensional faces. 
Coefficients from (2 + x)” in Worked Example 2.4A. Cube from 2/ has volume 2”. 


26 The pyramid has volume 4. The 4-dimensional pyramid has volume 4, (and 4; in R”) 

31 Base area 10, height 2, volume 20. 

35 S = (2,1,-—1), area ||PQ x PS|| = |\(—2,-—2,-—1)|| = 3. The other four corners 
can be (0, 0,0), (0, 0, 2), (1, 2, 2), (1, 1,0). The volume of the tilted box is | det] = 1. 


39 ACT = (det A)I gives (det A)(detC) = (det A)”. Then det A = (detC)!/3 with 
n = 4, With det A“! is 1/ det A, construct A~! using the cofactors. Invert to find A. 


Problem Set 6.1, page 293 


1 The eigenvalues are 1 and 0.5 for A, 1 and 0.25 for A”, 1 and 0 for A®. Exchanging 
the rows of A changes the eigenvalues to 1 and —0.5 (the trace is now 0.2 + 0.3). 
Singular matrices stay singular during elimination, so A = 0 does not change. 

3 A has Ay = 2 and Az = —1 (check trace and determinant) with x; = (1,1) and 
x2 = (2,—1). AT! has the same eigenvectors, with eigenvalues 1/1 = 4 and —1. 


6 Aand B have A, = landAz = 1. AB and BA have A = 2+ V3. Eigenvalues of AB 
are not equal to eigenvalues of A times eigenvalues of B. Eigenvalues of AB and BA 
are equal (this is proved in section 6.6, Problems 18-19). 
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8 (a) Multiply Ax to see Ax which reveals A (b) Solve (A — ÀI )x = 0 to find x. 


10 A has A, = 1 and å2 = .4 with x; = (1,2) and x2 = (1,—1). A” has A, = 1 and 
Az = 0 (same eigenvectors). A! has A; = 1 and Az = (.4)!9° which is near zero. 
So A! is very near A: same eigenvectors and close eigenvalues. 


11 Columns of A—A,/ are in the nullspace of A—A2/ because M = (A—A2zI)(A—AiT) 
= zero matrix [this is the Cayley-Hamilton Theorem in Problem 6.2.32]. Notice that 
M has zero eigenvalues (A, — A2)(A, — 41) = O and (Az — Ag) (2 — A1) = 9. 


13 (a) Pu = (uu')u = u(u'u) = u so À = 1 (b) Pv = (uu")v = u(u'v) = 0 
(c) xı = (—1, 1,0,0), x2 = (—3,0,1,0), x3 = (—5,0,0, 1) all have Px = Ox = 0. 


15 The other two eigenvalues are A = +(-1 + i ./3); the three eigenvalues are 1, 1,—1. 

16 SetA = 0 in det(A — AJ) = (Ay —A)... (Ay — A) to find det A = (A1)(A2)--- (Ay). 

7A, = s(a+d + /(a—d)* + 4bc) and Az = s(at+d —¥ )addtoa+d. 
If A has Ay = 3 and Ag = 4 then det(A — ÀI) = (A—3)(A — 4) = 2? - 7À + 12. 

19 (a) rank=2 (b) det(B™B)=0 (d) eigenvalues of (B? + J)“ are 1, $, 2 

20 Last rows are —28, 11 (check trace and det) and 6, —11, 6 (to match det(C — AJ)). 

22 à = 1 (for Markov), 0 (for singular), -4 (so sum of eigenvalues = trace = 4), 


23 È ol [o al; i i| Always A? is the zero matrix if A = 0 and 0, by the 
Cayley-Hamilton Theorem in Problem 6.2.32. 
28 BhasA = —1,—1,—1,3 and C has A = 1,1, 1, —3. Both have det = —3. 


32 (a) uisa basis for the nullspace, v and w give a basis for the column space 
(b) x = (0,4 3» t)i is a particular solution. Add any cu from the nullspace 
(c) If Ax = u had a solution, u would be in the column space: wrong dimension 3. 


34 det(P — AI) = 0 gives the equation A* = 1. This reflects the fact that P* = I. 
The solutions of A* = 1 are A = 1,i,—1,—i. The real eigenvector x; = E 1,1,1) 
is not changed by the permutation P. Three more eigenvectors are (i, i?, i? i8) and 


(1, —1, 1, —1) and (~i, (~i)?, (-i)°, (i). 
36 Ay = e2ni/3 and A2 = e 2ni/3 give det Ay Ao — | and trace ay + ho -a 
— . 2 
A= Fi 8 ane | with 0 = = has this trace and det. So does every M~1AM! 


Problem Set 6.2, page 307 


+f) 2af gf opp -14. 1 Fp Lp i typo oli - 
0 3); [0 1/]O 3];/0 1) 13 3 -1 33,0 44)4 
3 If A = SAS7™! then the eigenvalue matrix for A + 2] is A + 2/ and the eigenvector 
matrix is still S. A+ 27 = S(A +2087! = SAST! + SQI)S7! = A +21. 
4 (a) False: don’t know A’s__(b) True (c) True (d) False: need eigenvectors of S 
6 The columns of S are nonzero multiples of (2,1) and (0,1): either order. Same for A71. 


ai= ple 
Lt 
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12 
13 


15 


17 


19 
21 


24 


26 


27 


28 


32 


33 


— -l l 1 _ l Ay A2 Ay 0 I —A, ke-1 _ 
A=SAS =| = |T Tilo alla ah sas = 


l Ài A2 àk 0 1 —à2 |] 1 | _ | 2nd component is Fy 
Ay—AaL! Ldp_o akii- ajloj L@’-AH/@1—-A2)] 


(a) A= È 5 has À = 1, A. = -—ł} with x; = (1,1), x2 = (1, —2) 


io] 
„a fi oge oo Z 4 O 


(a) False: don’t know À (b) True: an eigenvector is missing (c) True. 
_| 8 3 _| 9 4 _ | 10 5|, only eigenvectors 
A= È 3| (or other), A = È il A= [as al are x = (c,—c). 


A* = SA* S7! approaches zero if and only if every |A] < 1; AK > AS, AE > 0. 
9 0 3 -—3 3 3 3 3 
OB i abalone on] 
A10 [5] = (.9)!° H + (.3)1° Ei because [o] is the sum of H + [i] 
ge a[i fs ofp aJ f s-a 
~ JO —1||O 4 0 —i| 10 4k ' 


trace ST = (aq + bs) + (cr + dt) is equal to (qa + re) + (sb + td) = trace TS. 
Diagonalizable case: the trace of SAS~! = trace of (AS~!)S = A: sum of the X’s. 


The A’s form a subspace since cA and A; + A2 all have the same S. When S = I 
the A’s with those eigenvectors give the subspace of diagonal matrices. Dimension 4. 


Two problems: The nullspace and column space can overlap, so x could be in both. 
There may not be r independent eigenvectors in the column space. 


R=SVAS= f J has R?= A. ~B needs A = J/9 and v—1, trace is not real. 


wi wale 


1 0 
A’ = A gives x'ABx = (Ax)'(Bx) < ||Ax||||Bx|| by the Schwarz inequality. 
BT = —B gives —xTBAx = (Bx)"(Ax) < ||Ax||||Bx|]. Add to get Heisenberg’s 
Uncertainty Principle when AB — BA = J. Position-momentum, also time-energy. 
If A = SAS! then (A — àI) --- (A — àn I) equals S(A — A12) (A —AnI)S7!. 
The factor A — À j I is zero in row j. The product is zero in all rows = zero matrix. 


A = 2,—1,0 are in A and the eigenvectors are in S (below). Ak = SA* ST! is 


Í -1 1 ate | —2 -2 -Zi 1 |+ - l | 
1-1 —1 610 3 -3 6}]2 1 1 3 |1 1 1 
Check k = 4. The (2, 2) entry of A* is 24/6 + (—1)*/3 = 18/6 = 3. The 4-step paths 


that begin and end at node 2 are 2 to 1 to 1 to 1 to 2, 2 to 1 to 2 to 1 to 2, and 2 to 1 to 
3 to 1 to 2. Much harder to find the eleven 4-step paths that start and end at node 1. 


Note that Mo 4 can have v —1 =i and —i, trace 0, real square root E | 
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35 Bhas A =i and —i, so B* hasA* = 1 and 1 and B4 = I. C has À = (1 + V3i)/2. 
This is exp(--zi/3) so A> = —1 and —1. Then C? = —7 and C1% = -C, 


37 Columns of S times rows of A ST! will give r rank-1 matrices (r = rank of A). 
Problem Set 6.3, page 325 


1 uj =e” lo} u =e! i If u(0) = (5, —2), then u(t) = 3e“! B + 2e? |_| 
-1 1 
1 -l1 


A,=0 _fi _f 1], vG)=204+10e7* v(oo) = 20 


4 d(v+w)/dt = (w—v)+(v—w) = 0, so the total v + w is constant. A = 


8 5 T? | has Ay = 5, x1 = i| Ay =2, x2 = |; | rabis ro = 200% + 10e”, 


w(t) = 10e% +20e?". The ratio of rabbits to wolves approaches 20/10; e% dominates. 
9 6 
14 When A is skew-symmetric, ||u(t)|| = lleu (0)|| is |]2(0)]|. So e4! is orthogonal. 
4 1 0 4 
15 up = 4and u(t) = ce +4; up = B and u(t) = cye! H + ce! H + pi! 


16 Substituting u = e°'v gives ce“v = Aey —e“b or (A—cl)v = bov = 
(A— cI) |b = particular solution. If c is an eigenvalue then A — c/ is not invertible. 


20 The solution at time t + T is also e4¢+7) (0). Thus e^ times e47 equals e4¢+7), 


a [o al= fe -ille ollo -ifle allo alo aJ=[o “"} 


t t 
22 A? = A gives e^ =I +At+4At? +- =+ -A= E "| 


12 A= |5 J has trace 6, det 9, A = 3 and 3 with one independent eigenvector (1, 3). 


1 
_fi 1)_f1 a] o] 4 a fet te” et) 
wae J TEAR Yh merle 5] 
26 (a) The inverse of e4! is e~ 4 (b) If Ax = Ax then efx = eĉ! x and e% £0. 


27 (x,y) = (e*!, e~**) is a growing solution. The correct matrix for the exchanged u = 


(y, x) is a F . It does have the same eigenvalues as the original matrix. 


28 Centering produces U p+ = | 1 ane | U,. At At = 1,A = ei"? and 


ei"? both have AS = 1 so AS = I. Ug = AU o comes exactly back to Up. 


. _ ` 4— _ _ 
First A has à = ti and A* =I yy _ <1)" $ 2n 2n 


29 Second A has A = —1, —1 and 2n 2n +1 


| Linear growth. 
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30 With a = At/2 the trapezoidal step is Un41 = _} |l- 2a U 
n+l 1+a2 —~2a 1—a2 ne 
Orthonormal columns => orthogonal matrix = |U n+l = ||U nl 


31 (a) (cos A)x = (cosA)x (b) A(A) = 2x and 0 so cosà = 1,1 andcosA = I 
(c) u(t) = 3(cos2xt)(1, 1)+1(cosOt)(1,—-1) [w’ = Au has exp, u” = Au has cos] 


Problem Set 6.4, page 337 


3 A = 0,4, —2; unit vectors +(0, 1, —1)/ v2 and £(2, 1, 1)/6 and (1, —1, -1)/V3. 


5 Q= 1 2 l ? The columns of Q are unit eigenvectors of A 
3 -1 -2 2l Each unit eigenvector could be multiplied by —1 


8 If A? = 0 then all A> = 0 so all à = 0 as in A = È ol; If A is symmetric then 


A? = QA? QT = 0 gives A = 0. The only symmetric A is Q 0 QT = zero matrix. 
10 If x is notreal then À = xTAx/x7x is not always real. Can’t assume real eigenvectors! 


1 1 1 1 
3 1]_,[ 4 -3|,,/2 flf o 2]_,[ 64 -48 36 48 
" Í 3|- È eel; hf 16| =°|-4g 36 |*?°] 48 64 


14 M is skew-symmetric and orthogonal; A’s must be i, i, —i, —i to have trace zero. 


16 (a) If Az = ày and Aly = Az then B[ y; —z] =[-Az; ATy] =—A[y; —z]. So 
—j is also an eigenvalue of B. (b) A'Az = AT(Ay) = A*z. (c) A = —1, —1, 1, 1; 
xı =(1,0,—1,0), x2 = (0,1,0,—1), x3 = (1,0, 1,0), x4 = (0,1,0, 1). 


1 1 0 1 0 i Perpendicular for A 
19 A has $ = [i -l1 ol; B has S = fo l o Not perpendicular for B 


0 0 i 0 0 2d since BT Æ B 
_ [1 2] (b) True from AT = QAQT i 
21 (a) False. A = E 1 (c) True from A~! = QA~1Q7 (d) False! 
22 A and A7 have the same A’s but the order of the x’s can change. A = 3 ; has 


Ay =i and A, = —i with x; = (1,i) first for A but x; = (1, —i) first for AT. 


23 A is invertible, orthogonal, permutation, diagonalizable, Markov; B is projection, di- 
agonalizable, Markov. A allows OR, SAS~!, QAQ?; B allows SAST! and QAQ?. 


24 Symmetry gives QAQ’ if b = 1; repeated A and no S if b = —1; singular if b = 0. 
25 Orthogonal and symmetric requires |A] = 1 and A real, so à = +1. Then A = +/ or 
r _|cos@ —sinO|{/1 0 cos@ sind cos26  sin20 
A=QAQT= = 
snO cos@}{0 —1]|]—sin@ cos sin20 —cos2é6 |" 
27 The roots of A* + bÀ + c = 0 differ by Vb? — 4c. For det(A + tB — AJ) we have 


b = —3 — 8t and c = 2 + 16t — t?. The minimum of b? — 4c is 1/17 at t = 2/17. 
Then Az — A, = 1/V17. 
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29 (a) A = QAQ" times AT = QA7Q™ equals AT times A because AAT = ATA 
(diagonal!) (b) step 2: The 1,1 entries of TT T and TTT are |a|? and |a|? + |b/?. 
This makes b = Oand T = A. : 

30 a1; is [an tee qin | [Aid tte Ann | < Amax (lgl? +e lqin|”) = Amax- 

31 (a) x™(Ax) = (Ax)’x = xTATx = ~—xTAx. (b) Z'Az is pure imaginary, its real 
partis xTAx + yTAy =0 +0 (c) detA = Àı ... Àn = 0: pairs of A’s = ib, —ib. 


Problem Set 6.5, page 350 


Positive definite 1 O})1 b _ fl off 0 1 b| T 
3 for-3 <b <3 | (o ow |= |b aE ait |= zor 


Positive definite 0||2 4 |_]|10j||2 O 1 2] _ LDLT 
forc > 8 1]}0 c—8| |2 1)//0 c-8]]O0 1) 


4 f(x,y) =x? +4xy + 9y? = (x + 2y)? + 5y?; x? + Oxy + 9y? = (x + 3y)?. 


8 A= 3 6) [1 O|}3 O]]1 2] Pivots 3,4 outside squares, ¢;; inside. 
~16 16|7]|2 14/0 4410 1] x TAx = 3(x + 2y)? + 4y? 


2-1 0 . 2-1 -1 1] po 
has pivot 
10 a=- 2 -1 534 =|- 2 -1} issinguas 2 [1 | =|] 
0-1 2) 223 -1 -1 2 i} Lo 


12 A is positive definite for c > 1; determinants c,c? — 1, (e — 1)? (c + 2) > 0. B is 
never positive definite (determinants d — 4 and —4d + 12 are never both positive). 


14 The eigenvalues of A~! are positive because they are 1/A(A). And the entries of A7! 
pass the determinant tests. And xTAT!x = (A7!x)™A(A7!x) > 0 for all x 40. 


17 If aj; were smaller than all A’s, A — ajjI would have all eigenvalues > 0 (positive 
definite). But A — a jj I has a zero in the (j, j) position; impossible by Problem 16. 
21 A is positive definite when s > 8; B is positive definite when t > 5 by determinants. 


[i il [/ | | 1 

1 1 1/|-1 1 2 1 4 0 3 1 
/2 V2 

24 The ellipse x? + xy + y? = 1 has axes with half-lengths 1/ VA = v2 and /2/3. 


9 3] [4 ‘8 1 0174 o1f1 2 2 4 
— AT — . — — 
26 A=cTc = anf \=[2 Ait Ji: i | ana c = jo | 


_ | 6x? 2x|. a. op  t.2 2 
29 H is positive definite if x #0; Fy = (5x* + y)” = 0 on the curve 


o 


No — 


1} 2x 2 
1.2 6x 1 0 1|.. . . , 
sx°+y=0; HW = | 1 1 = k ol is indefinite, (0, 1) is a saddle point of Fo. 
31 Ifc > 9 the graph of z is a bowl, if c < 9 the graph has a saddle point. When c = 9 
the graph of z = (2x + 3y) is a “trough” staying at zero on the line 2x + 3y = 0. 
32 Orthogonal matrices, exponentials e4’, matrices with det = 1 are groups. Examples of 
subgroups are orthogonal matrices with det = 1, exponentials e^” for integer n. 


34 The five eigenvalues of K are 2 — 2 cos ÉZ = 2— /3,2-1,2,2+1,2+ V3: 
product of eigenvalues = 6 = det K. 
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Problem Set 6.6, page 360 


1 B=GCG !=GF!AFG"!soM=FG"!.C similar to A and B > A similar to B. 

6 Eight families of similar matrices: six matrices have A = 0, 1 (one family); three 
matrices have A = 1, 1 and three have à = 0, 0 (two families each!); one has A = 
1, —1; one has A = 2, 0; two have A = $(1 + ./5) (they are in one family). 

7 (a) (M71AM)(M~!x) = M7!(Ax) = M~'0=0 (b) The nullspaces of A and 
of M~!AM have the same dimension. Different vectors and different bases. 


8 Same A But A = 0 1 and B = 0 2] have the same line of eigenvectors 
Same S wa=to 0 “10 QJ] and the same eigenvalues A = 0,0. 


2 k k-1 -1 -2 
2_ |c 2c k |c" ke 710 -1 |e" -e 
10 J -|9 of | and J =|5 ck g = I and J -| 0 mi 
14 (1) Choose M; = reverse diagonal matrix to get MPS; M; = MF in each block 
(2) Mo has those diagonal blocks M; to get My 'JMo = JT. (3) AT = (MTI)TJTMT 
equals (M—!)™My!JMoMt = (MMy)M")"! A(MMoM?), and A? is similar to A. 
17 (a) False: Diagonalize a nonsymmetric A = SA S~!. Then A is symmetric and similar 
(b) True: A singular matrix has à = 0. (c) False: E 0 and o F are similar 
(they have A = +1) (d) True: Adding 7 increases all eigenvalues by 1 
18 AB = BT! (BA)B so AB is similar to BA. If ABx = Ax then BA(Bx) = A(Bx). 
19 Diagonal blocks 6 by 6, 4 by 4; AB has the same eigenvalues as BA plus 6 — 4 zeros. 


22 A = MJM!,A” = MJ”M™! = 0 (each J* has 1’s on the kth diagonal). 
det(A — ÀI) = A” so J” = 0 by the Cayley-Hamilton Theorem. 


Problem Set 6.7, page 371 


aevzvte[n afe Jf f all ol =] 


/10 J5 
2] 3+5 3—/5 But Ais 
4 ATA =AAT = li i has eigenvalues o? = — m 3 ~ 4" indefinite 


o, = (1 + V5)/2 = Aj (A), 02 = (V5 — 1)/2 = —A2(A); u; = vı but uz = —v2. 
5 A proof that eigshow finds the SVD. When Kı = (1,0),V2 = (0, 1) the demo finds 
AV, and AV 2 at some angle 0. A 90° turn by the mouse to V2, —V finds AV 2 and 
—AV, at the angle m — 0. Somewhere between, the constantly orthogonal vı and v2 
must produce Av, and Av2 at angle 1/2. Those orthogonal directions give u; and u2. 


9 A =UV" since all oj = 1, which means that © = 7. 
14 The smallest change in A is to set its smallest singular value o2 to zero. 
15 The singular values of A + I are not a; + 1. Need eigenvalues of (A + 7)™(A + 1). 


17 A = UV" = [cosines including u4] diag(sqrt(2 — /2,2,2 + /2)) [sine matrix]". 
AV = UX says that differences of sines in V are cosines in U times o’s. 
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Problem Set 7.1, page 380 


T(v) = (0, 1) and T(w) = viv are not linear. 
(a) S(T@)) =v (b) S(T (v1) + T(v2)) = S(T (1) + S(T (2). 
Choose v = (1, 1) and w = (—1,0). T(v) + T(w) = (0, 1) but Tw + w) = (0,0). 


P TO) =v (b) TIT) =v + 2,2) © TT) =v @ TT) = 
v). 


10 Not invertible: (a) T(1,0)=0 (b) (0, 0, 1) is not in the range (e) T(0,1)=0. 
12 Write v as a combination c(1, 1) + d(2,0). Then T7(v) = c(2,2) + d(0, 0). T(w) = 
(4, 4); (2, 2); (2,2); if v = (a,b) = b(1, 1) + $2 (2, 0) then T(v) = b(2, 2) + (0,0). 


16 No matrix A gives A É 0 = o l . To professors: Linear transformations on 


N ao & © 


matrix space come from 4 by 4 matrices. Those in Problems 13-15 were special. 
17 (a) True (b) True (c) True (d) False. 
19 T(T-'(M)) = M soT7!(M) = AMB. 


20 (a) Horizontal lines stay horizontal, vertical lines stay vertical (b) House squashes 
onto a line (c) Vertical lines stay vertical because 7'(1,0) = (a11, 0). 


27 Also 30 emphasizes that circles are transformed to ellipses (see figure in Section 6.7). 


29 (a) ad —bc = 0 (b) ad —be > 0 (c) jad — be| = 1. If vectors to two 
corners transform to themselves then by linearity T = J. (Fails if one corner is (0, 0).) 


Problem Set 7.2, page 395 


3 (Matrix A) = B when (transformation T)? = S and output basis = input basis. 

5 T(v, + v2 + v3) = 2w, + w2 + 2w3; A times (1, 1, 1) gives (2, 1, 2). 

6 v = c(v2—Vv3) gives T(v) = 0; nullspace is (0, c, —c); solutions (1, 0,0) + (0, c, —c). 

8 For T? (v) we would need to know T(w). If the w’s equal the v’s, the matrix is A*. 
12 (c) is wrong because w; is not generally in the input space. 


14 (a) É | (b) 5 5 = inverse of (a) (© A B must be 2A pi 


—1 

TAD 

18 (a,b) = (cos 8, — sin 0). Minus sign from Q7! = QT. 

20 w2(x) = 1—x?; w3(x) = 3 (x? — x); y = 4w; + Swe + 6w3. 

23 The matrix M with these nine entries must be invertible. 

27 IfT is not invertible, T (v1), ..., T (vn) is not a basis. We couldn’t choose w; = T (v;i). 
30 S takes (x, y) to (—x, y). S(T(v)) = (FI, 2). S(v) = (—2, 1) and T(S (v)) = (1, —2). 


34 The last step writes 6, 6, 2, 2 as the overall average 4, 4, 4, 4 plus the difference 2, 2, 
—2, —2. Therefore cı = 4 and cz = 2 and c3 = į and c4 = 1. 
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35 The wavelet basis is (1, 1, 1, 1,1, 1, 1, 1) and the long wavelet and two medium wavelets 
(1,1,—-1, —1, 0, 0, 0, 0), (0, 0,0, 0, 1, 1, ~1, —1) and 4 wavelets with a single pair 1, —1. 


36 If Vb = We then b = V—! We. The change of basis matrix is VIW. 


37 Multiplication by f A with this basis is represented by 4 by 4 A = É bI | 


38 If w; = Av, and w2 = Av then a11 = ad22 = 1. All other entries will be zero. 


Problem Set 7.3, page 406 


+, |10 20 _ — if] — tf 2t 
1 ata =| 5 40 has 4 = 50 and 0, vı = ~z a) = z 1 |) 1 = v50. 


1 1 
Avı = Z [5l = 01u and Av2 = 0. ui = Yio 3 and AATu, = 50 u4. 
1 |7 -1{ 1 |10 20 , , . 1 
3 A= QH = — — . H is semidefinite because A is singular. 
ak J /50 È ‘| 
+_yf1/V50 Ol r afl 3). y+, [2 4 ,_f. 3 
4A =v | o oJU =sjz p A 4=]4 gpi Ej 9} 
vi 
7 [ou ozu | H = 01" 1v1 + 02u20}. In general this is ouv! +---+o0,-u;v!. 
2 


9 At is ATI because A is invertible. Pseudoinverse equals inverse when AT! exists! 
2 12 36 .48 0 
11 A=[1][5 0 0]VT and A+t=V | 0 |= fs) AtA=| 48 64 0|:4At=[1] 
0 0 o 00 
13 If det A = 0 then rank(4) < n; thus rank(At) < n and det At = Q. 


16 xt in the row space of A is perpendicular to ¥ — x" in the nullspace of ATA = 
nullspace of A. The right triangle has c? = a? + b?. 


17 AAtp =p, AAte =0, AtAx, = xr, ATAx, = 0. 


19 L is determined by £21. Each eigenvector in S is determined by one number. The 
counts are 1 + 3 for LU, 1+2+41forLDU, 1+3 for OR, 14+2+4+1forUxV', 
24+2+0 for SAS}. 


22 Keep only the r by r corner È, of © (the rest is all zero). Then A = U EVT has the 
required form A = U M, 8, MIV" with an invertible M = Mı E, MJ in the middle. 


23 0 Aljuy_| Av | _ ol ë The singular values of A are 
AT O|}vl[ | Ate] ~ |v[ eigenvalues of this block matrix. 
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Problem Set 8.1, page 418 


3 The rows of the free-free matrix in equation (9) add to[O0 0 0] so the right side needs 
Atfa+ fa =0. f = (-1,0,1) gives cou; —cgu2 = —1,c3u2—c3u3 = —1,0 = 0. 
Then “particular = (—c3* — ez ', -cz',0). Add any multiple of uputispace = (1, 1, 1). 


d du du]! 
4 l-3 (coo) dx=— jets $e | =0 cory cond) so we need [te dx =0. 


6 Multiply ATCA; as columns of A? times c’s times rows of Ay. The first 3 by 3 
“element matrix” cE; =[1 0 O]"cy[1 0 O] has cy in the top left corner. 
8 The solution to —u” = 1 with u(0) = u(1) =0 is u(x) = t(x —x*), Atx = 
this gives u =2, 3, 3, 2 (discrete solution in Problem 7) times (Ax)? = 1/25. 
11 Forward/backward/centered for du /dx has a big effect because that term has the large 
coefficient. MATLAB: E = diag(ones(6, 1), 1); K = 64 * (2 x eye(7) — E — E’); 
D = 80 x (E— eye(7)); (K + D)\ones(7, 1); % forward; (K — D’)\ones(7, 1); 
% backward; (K + D/2 — D'/2)\ones(7, 1); % centered is usually the best: more 
accurate 


2 3 
5° 5? 


wale 


1 
5? 


Problem Set 8.2, page 428 


-h 


—1 1 0 c 1 
A= E 0 1 | ; nullspace contains : | ; o] is not orthogonal to that nullspace. 
0 —i 1 c 0 


2 Ay = 0 for y = (1, —1, 1); current along edge 1, edge 3, back on edge 2 (full loop). 


5 Kirchhoff’s Current Law ATy = f is solvable for f = (1,—1,0) and not solvable 
for f = (1,0,0); f must be orthogonal to (1, 1, 1) in the nullspace: fi + fo+ f3 = 0. 


2 —] -i 3 1 c 
6 ATAx = [= 2 -i |x = | = f produces x = j-i + |: potentials 
—-1 -l 2 0 0 c 


x = 1,—1,0 and currents —Ax = 2, 1,—1; f sends 3 units from node 2 into node 1. 


1 3 —1 -2 1 5/4 c 
rar 2 a-l- s Tali g= | 0) yx =| 1 + any e| 
2 -2—2 4 —1 7/8 c 
potentials x = 3,1, 2 and currents —CAx = 4,3, 3. 
9 Elimination on Ax = b always leads to y'b = O in the zero rows of U and R: 
—b, + b2 — b; = 0 and b; — b4 + bs = 0 (those y’s are from Problem 8 in the left 
nullspace). This is Kirchhoff’s Voltage Law around the two loops. 
2 —1 —1 07 diagonal entry = number of edges into the node 
ATA = -1 3 —l ~—1 | the trace is 2 times the number of nodes 
~ {—1 -1 3 —1 | off-diagonal entry = —1 if nodes are connected 
0 —1 —1 2] AA isthe graph Laplacian, ATCA is weighted by C 
4 -2 —2 0 
-2 8 -3 -3 
T = — 
13 A CAx=| 2 3 g _3/* = 
0-3 -3 6 — 


11 


gives four potentials x = (3,2, 4,0) 
I grounded x4 = 0 and solved for x 
currents y = —CAx = (3, 2,0, ;, 3) 


= O ooe 
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17 (a) 8 independent columns (b) f must be orthogonal to the nullspace so f’s add 
to zero (c) Each edge goes into 2 nodes, 12 edges make diagonal entries sum to 24. 


Problem Set 8.3, page 437 


afs Jf! J pe- ge ILE geg 


3 à = land .8, x = (1,0); 1 and —.8, x = (3,3); 1,4, and 4, x = (4,4, t 


5 The steady state eigenvector for A = 1 is (0,0, 1) = everyone is dead. 


N” 
. 


6 Add the components of Ax = Ax to find sum s = As. If à Æ 1 the sum must be s = 0. 


7 (.5)* — 0 gives A* > A®; any A = É + 4a 6- A with axil 


4— 4a 4+ .6a 4+ 6a >0 
9 M? is still nonnegative; [1 --- 1]M =[1 --- 1]so multiply on the right by M to 
find[1 --- 1]M*=[1 --- 1] = columns of M? add to 1. 


10 A = l anda + d — 1 from the trace; steady state is a multiple of x; = (b, 1 — a). 


12 B has å = 0 and —.5 with x; = (.3, .2) and x2 = (—1, 1); A has À = 1 so A — I has 
à = 0. e— >! approaches zero and the solution approaches ce% x, = c1x1. 


13 x = (1, 1, 1) is an eigenvector when the row sums are equal; Ax = (.9, .9, .9). 


6 32 
16 A = 1 (Markov), 0 (singular), .2 (from trace). Steady state (.3, .3, .4) and (30, 30, 40). 


17 No, A has an eigenvalue A = 1 and (J — A)T! does not exist. 


15 The first two A’s have Amax < 1; p = B and oh I- È o] has no inverse. 


19 A times S~!AS has the same diagonal as S~' AS times A because A is diagonal. 
20 If B>A>O and Ax =Amax(A)x >0 then Bx >Amax(A)x and Amax(B) > Amax(A). 


Problem Set 8.4, page 446 


1 Feasible set = line segment (6, 0) to (0, 3); minimum cost at (6, 0), maximum at (0, 3). 
2 Feasible set has corners (0, 0), (6, 0), (2, 2), (0, 6). Minimum cost 2x — y at (6, 0). 
3 Only two comers (4,0, 0) and (0, 2,0); let x; + —oo, x2 = 0, and x3 = x, — 4. 


4 From (0,0, 2) move to x = (0, 1, 1.5) with the constraint x, + x2 +2x3 = 4. The new 
cost is 3(1) + 8(1.5) = $15 so r = —1 is the reduced cost. The simplex method also 
checks x = (1,0, 1.5) with cost 5(1) + 8(1.5) = $17; r = 1 means more expensive. 

5c =[3 5 7] has minimum cost 12 by the Ph.D. since x = (4,0, 0) is minimizing. 
The dual problem maximizes 4y subject to y < 3, y < 5, y < 7. Maximum = 12. 


8 yTb < yTAx = (A'y)'x < e'x. The first inequality needed y > O and Ax —b > 0. 
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Problem Set 8.5, page 451 


. . 2 
1 fo” cos((j +k)x)dx = [aea] = 0 and similarly fe" cos((j —k)x) dx =0 
Notice j — k Æ 0 in the denominator. If j = k then fo * cos? jxdx =x. 


4 PMc —cx)dx = 0 and fi @? — 4)(x3 —cx)dx = 0 for all c (odd functions). 
Choose c so that fy x(x? — cx) dx = [£x° — $x9]1, = 2-c% =0. Then c = 2. 

5 The integrals lead to the Fourier coefficients a; = 0, bı = 4/2, b =0. 

6 From eqn. (3) ag = 0 and by = 4/7k (odd k). The square wave has || f ||? = 2x. 
Then eqn. (6) is 27 = 2(16/ m? + $ + $ +---). That infinite series equals 27/8. 

8 lol? =14+5+4444--- =250 oll = V2; ol? = 1+a?+at+ = 1/(1—a?) 
so ||v|| = 1/v1 — a?; a + 2sinx + sin? x) dx = 27 +0 + z so |f| = V3z. 

9 (a) f(x) = (1 + square wave)/2 so the a’s are 5 , 0, 0,... and the b’s are 2/7, 0, 
—2/3,0,2/57,... (b) dg = fe" xdx/2n =n, all other ay = 0, by = —2/k. 

1 s3 


—lil . Z) — Z sj n=l — N3 zj 
x=3 + 5 cos 2x; cos(x + 7) = cos x cos 3 ~ sinx sin = = 5 cosx 5 Sinx. 
l sin(kh/2) 
—,a4 = —— 
2r’ * xkh/2 


11 cos? 


1 l 
13 do = z f F(x)dx = >= for delta function; all bg = 0. 


Problem Set 8.6, page 458 


If o3 = 0 the third equation is exact. 


0, 1,2 have probabilities 4, 4, 4 and o? = (0- 1)?4 + (1-1)?$ + -11 =}. 


total ~~ =} 
Mean m = po and variance o? = (1 — po)? po + (0 — po)?(1 — po) = poll — po). 
Minimize P = a?o? + (1—a)*o3 at P’ = 2ao? —2(1—a)o} = 0;a = 03/(07 +02) 
recovers equation (2) for the statistically correct ‘choice with minimum variance. 

8 Multiply LE LT = (ATH! A) ATE TIE ETIA(ATE TIAJ? = P = (ATE) 

9 Row 3 = —row | and row 4 = —row 2: A has rank 2. 


3 
4 
5 Mean (3, 4). Independent flips lead to £ = diag(4, 4). Trace = ož 
6 
7 


Problem Set 8.7, page 464 


1 (x,y,z) has homogeneous coordinates (cx,cy,cz,c) forc = 1 and all c 40. 
4 S = diag (c.c,c,1);row 4 of ST and TS is 1,4,3, 1 and c, 4c, 3c, 1; use vTS! 


1/8.5 
5 S= | 1/11 | for a 1 by 1 square, starting from an 8.5 by 11 page. 
1 


5 —4 -2 
221 l 
on=(5.5 5) has P= 1 —nn? = —4 5 —2 |. Notice ||n|| = 1. 
3°3°3 9 2 2 8 
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5 -4 —2 0 
10 We can choose (0, 0, 3) on the plane and multiply T-P T4} = ł 5 > a o 
6 6 3 9 


11 (3,3,3) projects to $(—1, —1, 4) and (3,3,3, 1) projects to ($, $, 3, 1). Row vectors! 
13 That projection of a cube onto a plane produces a hexagon. 


1 -8 —4 
111 loud 
14 ay I-2 T = pai i —8 1 —4 = | -—,—~—,—- |]. 
(3,3,3)(I — nat) EJE E | ( 1L 5) 


15 (3,3,3,1) > (3,3,0,1) > -4,-4,-8, 1) > 4, —3, 4,1). 
17 Space is rescaled by 1/c because (x, y, Z, c) is the same point as (x/c, y/c,z/c, 1). 


Problem Set 9.1, page 472 


1 Without exchange, pivots .001 and 1000; with exchange, 1 and —1. When the pivot is 


1 1 1 
larger than the entries below it, all |¢;;| = |entry/pivot| < 1. A=] 0 1 —1]. 
-I 1 1 


4 The largest ||x || = ||A7!8]| is || AT} I| = 1/Amin since AT = A; largest error 107!°/A min. 


5 Each row of U has at most w entries. Then w multiplications to substitute components 
of x (already known from below) and divide by the pivot. Total for n rows < wn. 


6 The triangular L~!, U-!, R! need $n? multiplications. Q needs n? to multiply the 
right side by Q7! = QT., So QRx = b takes 1.5 times longer than LU x = b. 

7 UUT! = I: Back substitution needs j j? multiplications on column j, using the j 
by j upper left block. Then $(1? + 2? + -+ + n?) œ 4 (4n?) = total to find U7}. 


10 With 16-digit floating point arithmetic the errors ||x — Xcomputeal| for € = 1073, 1076, 
10-9, 10712, 10715 are of order 10716, 1071}, 1077, 1074, 1073. 


_ -3 _ —_ 1 |10 14 = 4; use — 0 
11 (a)cosé Te sin? = Tio" R= OnA= | 0 do? x = (1, —3)/ V10 


13 Q;; A uses 4n multiplications (2 for each entry in rows i and j). By factoring out cos 9, 
the entries 1 and + tan 0 need only 2” multiplications, which leads to Zn? for QR. 


Problem Set 9.2, page 478 


1 |All = 2, JA = 2, e = 4 All = 3, JA! = 1, e = 3; JA = 24+ V2 = 
Amax for positive definite A, ATII = 1/Amins € = (2+ V2)/(2— V2) = 5.83. 

3 For the first inequality replace x by Bx in || Ax|| < || All||x||; the second inequality is 
just ||Bx|| < |B Ilx]. Then | AB] = max(|ABx||/lx{]) < ALIBI 


7 The triangle inequality gives ||Ax + Bx|| < ||Ax|] + ||Bx]|. Divide by ||x|| and take 
the maximum over all nonzero vectors to find |A + B|| < ||A|| + IBI]. 
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8 If Ax = Ax then || Ax||/||x|] = |A] for that particular vector x. When we maximize 
the ratio over all vectors we get || A|| > JÀ]. 


13 The residual b — Ay = (1077, 0) is much smaller than b — Az = (.0013, .0016). But 
z is much closer to the solution than y. 


— 659 —563 
14 det A = 107% so AT! = =10|_ 913 780 |: lAl > 1, IAT} > 106, then c > 10°. 


16 x?+---+-x? is not smaller than max(x?) and not larger than (|xy|+---+|xn|)? = [|x|]. 
x ++ "x2 < n max(x?) so ||x|| < valix llo. Choose y; = sign x; = +1 to get 


Ix = =x- y < |lxilllyll = vallxll. x = (1... 1) has xl = Jn lx. 


Problem Set 9.3, page 489 
2 If Ax = Ax then (J—A)x = (1—A)x. Real eigenvalues of B = I —A have |1—A| < 1 
provided A is between 0 and 2. 
6 Jacobi has STIT = ł È o| with |À |max = L, Small problem, fast convergence. 


0 
0 


WD] re WD] ee 


7 Gauss-Seidel has STIT = l | with |A|max = $ which is (JÀ]max for Jacobi}. 


9 Set the trace 2—2w + lo? equal to (w — 1) + (@—1) to find wosp = 4(2— V3) ~ 1.07. 
The eigenvalues w — 1 are about .07, a big improvement. 


_ _ Ube -Dz — ein (Jt1l)r 
i 2sin 1 sin ==> sin “Sy 


+ 
T 
ES TIT Then Ay = 2> 2eos $. 


| aina. 1 f2 1 f5 1 [14 1/2 
17 A7! = sli J gives u; = sli}™ = salezli] > Hoo = Kal 
i cosé@sing cos@(1+sin?@)  —sin? 0 
— Tj = = = 
18 R=Q A lo — sin? 6 | sea RQ | — sin? 6 — cos sin? 6 | 


20 If A—cI = QR then Aj = RQ +cl = Q7I(QR +cI)Q = O7!AQ. No change 
in eigenvalues because A, is similar to A. 


15 In the jth component of Axı, A, sin = 7 


The last two terms combine into —2 sin cos — 


21 Multiply Ag ; = bj-19 ;-1 +479; +574 j41 by 45 to find q; 7449; = = a; (because the 
g’s are orthonormal). The matrix form (multiplying by columns) is AQ = QT where 
T is tridiagonal. The entries down the diagonals of T are the a’s and b’s. 


23 If A is symmetric then A, = QTŽAQ = QTAQ is also symmetric. A; = RO = 
R(QR)R7! = RAR! has R and R“! upper triangular, so A; cannot have nonzeros 
on a lower diagonal than A. If A is tridiagonal and symmetric then (by using symmetry 
for the upper part of A;) the matrix A; = RAR"! is also tridiagonal. 


26 If each center a;; is larger than the circle radius r; (this is diagonal dominance), then 
0 is outside all circles: not an eigenvalue so AT! exists. 
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Problem Set 10.1, page 498 

2 In polar form these are /5e!®, 5e?/9, we, V5. 

4 jz xw] =6, |z+w| <5, |z/w| = 2, lz —w| <5. 

Sa+ib=244i,14 i,i,-14 Bi; w? =1. 

9 2+i; (2+i)(L+i) = 143i; el7/? = —i; e" = —1; TE = —i; (~i) =i. 
10 z +7 is real; z — Z is pure imaginary; ZZ is positive; z/Z has absolute value 1. 


12 (a) When a = b = d = 1 the square root becomes /4c; A is complex if c < 0 
(b) A=OandA = a + d whenad = bc (c) the A’s can be real and different. 


13 Complex A’s when (a+d)* < 4(ad—be); write (a+d)*—4(ad—bc) as (a—d)*+4be 
which is positive when be > 0. 


14 det(P —AI) = A* — 1 = Ohas à = 1, —1, i, —i with eigenvectors (1, 1,1, 1) and 
(1, ~—1,1,-1) and (1,i, —1,—i) and (1, —i, —1,7) = columns of Fourier matrix. 

16 The symmetric block matrix has real eigenvalues; so iA is real and À is pure imaginary. 

18 r = 1, angle Z — 9; multiply by e?’ to get e'*/* =i. 

21 cos 36 =Re[(cos 0 +i sin 9)7] = cos? 0—3 cos 6 sin? 6; sin 38 =3 cos? @ sin 0—sin? 0. 

23 e! is at angle @ = 1 on the unit circle; |i¢| = 1°; Infinitely many i€ = e! */2+27)e 

24 (a) Unit circle (b) Spiral into e~2* (c) Circle continuing around to angle 0 = 27°. 


Problem Set 10.2, page 506 


3 z = multiple of (1+i,1+i,—2); Az = 0 gives z" A = 04 so z (not Z!) is orthogonal 
to all columns of AM (using complex inner product z! times columns of A®). 
4 The four fundamental subspaces are now C(A), N(A), C(A), N(A®). A} and not AT. 
5 (a) (AN A)H = AM AHH = ANY again (b) If AUAz = 0 then (24 A#)(Az) = 0. 
This is || Az||? = 0 so Az = 0. The nullspaces of A and AA are always the same. 
l False A=U= É l (b) True: —i is not an eigenvalue when A = A". 


10 (1,1, 1), (1, e273, e4%#/3), (1, e47} , e2##/3) are orthogonal (complex inner product!) 
because P is an orthogonal matrix—and therefore its eigenvector matrix is unitary. 


6 


2 5 4 
WC = É 2 s| = 2 + 5P + 4P? has the Fourier eigenvector matrix F. 
5 4 2 


The eigenvalues are 2 + 5 + 4 = 11, 2 + 5e?7//3 4 4etil3 2 4 5efiP 4 4e8%iN3, 
13 Determinant = product of the eigenvalues (all real). And A = A™ gives det A = det A. 


s4- L[ 1 -1+2 oJ LE 1 ii 
~“Rlit+éi 1 jlo -1]l-i-i 1f 
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14/73 -14i]f1 o],f14+V43 #1-i 72 

> th L* = 6+2V3. 
L+i atallo HES 1+3] +2v3 
Unitary means |A| = 1. V = V" gives real A. Then trace zero gives A = 1 and —1. 


1 


19 The v’s are columns of a unitary matrix U, so U!" is UTI. Then z = UU#z = 
(multiply by columns) = vı (vřz) + --- + va (vtz): a typical orthonormal expansion. 

20 Don’t multiply (e~'*)(e"). Conjugate the first, then f°" e?/* dx = [e”* /2i]?* = 0. 

21 R+iS =(R+iS)4# = RT—iST; R is symmetric but S is skew-symmetric. 

a b+ic| [w ez | with |w/? + |z? =1 

—ic d |’ |—z etw] and any angle ġ 

27 Unitary U"U = J means (A?—iB")(A+iB) = (A7A+B™B)+i(A'B-BTA) = 1. 
ATA + B'B = I and ATB — BTA = 0 which makes the block matrix orthogonal. 


_fi-i t-i]f1 0717242) -2]_ o,o B 
30 a= 2 [o lalah 3| = sas . Note real A = 1 and 4. 


24 [1]and[—1]; any [e!® ]; E 


Problem Set 10.3, page 514 


8 c — (1,1,1,1,0,0,0,0) > (4,0,0,0,0,0,0,0) > (4,0,0,0,4,0,0,0) = Fec. 
C — (0,0,0,0, 1, 1,1, 1) — (0,0, 0,0, 4, 0,0, 0) —> (4,0, 0,0, —4,0,0,0) = FC. 
9 If w®* = 1 then w? is a 32nd root of 1 and ./w is a 128th root of 1: Key to FFT. 
13 e} = cote, + ĉ2 +3 and es = ĉo + cii + coi? +313; E contains the four 
eigenvalues of C = FEF! because F contains the eigenvectors. 
14 Eigenvalues e; = 2— 1 — 1 = 0, e3 = 2 — i — i? =2, e3 =2-(-l)-(-1) = 4, 
e4 = 2 — i? — i? = 2. Just transform column 0 of C. Check trace 0 +2+4+4+2=8. 
15 Diagonal E needs n multiplications, Fourier matrix F and F~! need in log, n multi- 
plications each by the FFT. The total is much less than the ordinary n? for C times x. 


Conceptual Questions for Review 


Chapter 1 


1.1 Which vectors are linear combinations of v = (3, 1) and w = (4, 3)? 


1.2 Compare the dot product of v = (3, 1) and w = (4, 3) to the product of their lengths. 
Which is larger? Whose inequality? 


1.3 What is the cosine of the angle between v and w in Question 1.2? What is the cosine 
of the angle between the x-axis and v? 


Chapter 2 


2.1 Multiplying a matrix A times the column vector x = (2, —1) gives what combination 
of the columns of A? How many rows and columns in A? 


2.2 If Ax = b then the vector b is a linear combination of what vectors from the matrix 
A? In vector space language, b lies in the space of A. 


2.3 If A is the 2 by 2 matrix | 22] what are its pivots? 


2.4 If A is the matrix | 9} | how does elimination proceed? What permutation matrix P 
is involved? 


2.5 If A is the matrix [2 1] find b and c so that Ax = b has no solution and Ax = ¢ has 
a solution. 


2.6 What 3 by 3 matrix L adds 5 times row 2 to row 3 and then adds 2 times row 1 to 
row 2, when it multiplies a matrix with three rows? 


2.7 What 3 by 3 matrix E subtracts 2 times row 1 from row 2 and then subtracts 5 times 
row 2 from row 3? How is £ related to L in Question 2.6? 


2.8 If A is 4 by 3 and B is 3 by 7, how many row times column products go into AB? 
How many column times row products go into AB? How many separate small mul- 
tiplications are involved (the same for both)? 


552 
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2.9 Suppose A = [| ý Y ] is a matrix with 2 by 2 blocks. What is the inverse matrix? 


2.10 How can you find the inverse of A by working with {A J]? If you solve the n 
equations Ax = columns of / then the solutions x are columns of 


2.11 How does elimination decide whether a square matrix A is invertible? 


2.12 Suppose elimination takes A to U (upper triangular) by row operations with the 
multipliers in L (lower triangular). Why does the last row of A agree with the last 
row of L times U? 


2.13 What is the factorization (from elimination with possible row exchanges) of any 
square invertible matrix? 


2.14 What is the transpose of the inverse of A B? 


2.15 How do you know that the inverse of a permutation matrix is a permutation matrix? 
How is it related to the transpose? 


Chapter 3 


3.1 What is the column space of an invertible n by n matrix? What is the nullspace of 
that matrix? 


3.2 If every column of A is a multiple of the first column, what is the column space of 
A? 


3.3 What are the two requirements for a set of vectors in R” to be a subspace? 


3.4 If the row reduced form R of a matrix A begins with a row of ones, how do you know 
that the other rows of R are zero and what is the nullspace? 


3.5 Suppose the nullspace of A contains only the zero vector. What can you say about 
solutions to Ax = b? 


3.6 From the row reduced form R, how would you decide the rank of A? 


3.7 Suppose column 4 of A is the sum of columns 1, 2, and 3. Find a vector in the 
nullspace. 


3.8 Describe in words the complete solution to a linear system Ax = b. 

3.9 If Ax = b has exactly one solution for every b, what can you say about A? 
3.10 Give an example of vectors that span R? but are not a basis for R?. 
3.11 What is the dimension of the space of 4 by 4 symmetric matrices? 


3.12 Describe the meaning of basis and dimension of a vector space. 
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3.13 Why is every row of A perpendicular to every vector in the nullspace? 
3.14 How do you know that a column u times a row v” (both nonzero) has rank 1? 


3.15 What are the dimensions of the four fundamental subspaces, if A is 6 by 3 with rank 
2? 


3.16 What is the row reduced form R of a 3 by 4 matrix of all 2’s? 
3.17 Describe a pivot column of A. 
3.18 True? The vectors in the left nullspace of A have the form AT y. 


3.19 Why do the columns of every invertible matrix yield a basis? 


Chapter 4 


4.1 What does the word complement mean about orthogonal subspaces? 


4.2 If V is a subspace of the 7-dimensional space R’, the dimensions of V and its or- 
thogonal complement add to 


4.3 The projection of b onto the line through a is the vector 
4.4 The projection matrix onto the line through a is P = 


4.5 The key equation to project b onto the column space of A is the normal equation 


4.6 The matrix ATA is invertible when the columns of A are 
4.7 The least squares solution to Ax = b minimizes what error function? 


4.8 What is the connection between the least squares solution of Ax = b and the idea of 
projection onto the column space? 


4.9 If you graph the best straight line to a set of 10 data points, what shape is the matrix 
A and where does the projection p appear in the graph? 


4.10 If the columns of Q are orthonormal, why is QTQ = J? 
4.11 What is the projection matrix P onto the columns of Q? 


4.12 If Gram-Schmidt starts with the vectors a = (2,0) and b = (1,1), which two 
orthonormal vectors does it produce? If we keep a = (2,0) does Gram-Schmidt 
always produce the same two orthonormal vectors? 


4.13 True? Every permutation matrix is an orthogonal matrix. 


4.14 The inverse of the orthogonal matrix Q is 


Conceptual Questions for Review 555 


Chapter 5 


5.1 
5.2 
5.3 
5.4 


5.5 


5.6 


5.7 


5.8 
5.9 
5.10 


5.11 
5.12 
5.13 


What is the determinant of the matrix —J? 
Explain how the determinant is a linear function of the first row. 
How do you know that det A~! = 1/ det A? 


If the pivots of A (with no row exchanges) are 2, 6, 6, what submatrices of A have 
known determinants? 


Suppose the first row of A is 0,0,0,3. What does the “big formula” for the determi- 
nant of A reduce to in this case? 


Is the ordering (2,5, 3,4, 1) even or odd? What permutation matrix has what deter- 
minant, from your answer? 


What is the cofactor C23 in the 3 by 3 elimination matrix E that subtracts 4 times 
row 1 from row 2? What entry of E~! is revealed? 


Explain the meaning of the cofactor formula for det A using column 1. 
How does Cramer’s Rule give the first component in the solution to Jx = b? 


If I combine the entries in row 2 with the cofactors from row 1, why is &21C11 + 
422C}12 + a423C}13 automatically zero? 


What is the connection between determinants and volumes? 
Find the cross product of u = (0,0, 1) and v = (0, 1, 0) and its direction. 


If A is n by n, why is det(A — AJ) a polynomial in A of degree n? 


Chapter 6 


6.1 


6.2 
6.3 
6.4 
6.5 
6.6 
6.7 


What equation gives the eigenvalues of A without involving the eigenvectors? How 
would you then find the eigenvectors? 


If A is singular what does this say about its eigenvalues? 

If A times A equals 4A, what numbers can be eigenvalues of A? 

Find a real matrix that has no real eigenvalues or eigenvectors. 

How can you find the sum and product of the eigenvalues directly from A? 
What are the eigenvalues of the rank one matrix [1 2 1]"[1 1 1]? 


Explain the diagonalization formula A = SAS~!. Why is it true and when is it true? 
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6.8 What is the difference between the algebraic and geometric multiplicities of an eigen- 
value of A? Which might be larger? 
6.9 Explain why the trace of AB equals the trace of BA. 
6.10 How do the eigenvectors of A help to solve du/dt = Au? 
6.11 How do the eigenvectors of A help to solve tg}, = Aug? 
6.12 Define the matrix exponential e^ and its inverse and its square. 


6.13 If A is symmetric, what is special about its eigenvectors? Do any other matrices have 
eigenvectors with this property? 


6.14 What is the diagonalization formula when A is symmetric? 
6.15 What does it mean to say that A is positive definite? 

6.16 When is B = ATA a positive definite matrix (A is real)? 

6.17 If A is positive definite describe the surface x7 Ax = 1 in R”. 


6.18 What does it mean for A and B to be similar? What is sure to be the same for A and 
B? 


6.19 The 3 by 3 matrix with ones fori > j has what Jordan form? 
6.20 The SVD expresses A as a product of what three types of matrices? 
6.21 How is the SVD for A linked to ATA? 


Chapter 7 


7.1 Define a linear transformation from R? to R? and give one example. 


7.2 If the upper middle house on the cover of the book is the original, find something 
nonlinear in the transformations of the other eight houses. 


7.3 If a linear transformation takes every vector in the input basis into the next basis 
vector (and the last into zero), what is its matrix? 


7.4 Suppose we change from the standard basis (the columns of J) to the basis given by 
the columns of A (invertible matrix). What is the change of basis matrix M? 


7.5 Suppose our new basis is formed from the eigenvectors of a matrix A. What matrix 
represents A in this new basis? 


7.6 If A and B are the matrices representing linear transformations S and T on R”, what 
matrix represents the transformation from v to S(T (v))? 


7.7 Describe five important factorizations of a matrix A and explain when each of them 
succeeds (what conditions on A?). 


GLOSSARY: A DICTIONARY FOR 
LINEAR ALGEBRA 


Adjacency matrix of a graph. Square matrix with aj; = 1 when there is an edge from 
node i to node j; otherwise aj; = 0. A = AT when edges go both ways (undirected). 


Affine transformation Tv = Av + vo = linear transformation plus shift. 
Associative Law (AB)C = A(BC). Parentheses can be removed to leave ABC. 


Augmented matrix [A b]. Ax = b is solvable when b is in the column space of A; then 
[A 5] has the same rank as A. Elimination on [A 6] keeps equations correct. 


Back substitution. Upper triangular systems are solved in reverse order x, to x1. 


Basis for V. Independent vectors v;,..., ug whose linear combinations give each vector 
in V as v = civi +... +cgvq. V has many bases, each basis gives unique c’s. 
A vector space has many bases! 


Big formula for n by n determinants. Det(A) is a sum of n! terms. For each term: 
Multiply one entry from each row and column of A: rows in order 1,...,” and 
column order given by a permutation P. Each of the 7! P’s has a + or — sign. 


Block matrix. A matrix can be partitioned into matrix blocks, by cuts between rows and/or 
between columns. Block multiplication of A B is allowed if the block shapes permit. 


Cayley-Hamilton Theorem. p() = det(A — AJ) has p(A) = zero matrix. 


Change of basis matrix M. The old basis vectors v; are combinations ` m;; w; of the 
new basis vectors. The coordinates of c1V1 +°- + Cy,Un = diwi +: + drwy are 
related by d = Mce. (Forn = 2 set vy = my Wy +m W2, V2 = M12W1 +M22W2.) 


Characteristic equation det(A — AZ) = 0. The n roots are the eigenvalues of A. 
Cholesky factorization A = CTC = (LVD)(LVD)' for positive definite A. 


Circulant matrix C. Constant diagonals wrap around as in cyclic shift S. Every circulant 
is Col + c18 +++ +cn,_1S" 1. Cx = convolution c * x. Eigenvectors in F. 


Cofactor C;;. Remove row i and column j; multiply the determinant by (—1) +7. 


Column picture of Ax = b. The vector b becomes a combination of the columns of A. 
The system is solvable only when b is in the column space C (A). 


Column space C (A) = space of all combinations of the columns of A. 
Commuting matrices AB = BA. If diagonalizable, they share n eigenvectors. 


Companion matrix. Put c1,...,Cn in row n and put n — 1 ones just above the main 
diagonal. Then det(A — AJ) = (cy + Cod + €3A2 +++ + pA"! — A”). 


Complete solution x = xp + x, to Ax = b. (Particular x p) + (xn in nullspace). 
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Complex conjugate Z = a — ib for any complex number z = a + ib. Then zZ = |z|?. 


Condition number cond(A) = c(A) = |All ATI] = omax/Omin. In Ax = b, the 
relative change ||ôx ||/||x || is less than cond(A) times the relative change ||65||/||8|]. 
Condition numbers measure the sensitivity of the output to change in the input. 

Conjugate Gradient Method. A sequence of steps (end of Chapter 9) to solve positive 
definite Ax = b by minimizing ax TAx — x'b over growing Krylov subspaces. 

Covariance matrix ©. When random variables x; have mean = average value = 0, their 
covariances %;; are the averages of x;x;. With means X;, the matrix & = mean of 
(x —X)(x —X)! is positive (semi)definite; © is diagonal if the x; are independent. 

Cramer’s Rule for Ax = b. B; has b replacing column j of A; x; = det B;/det A 

Cross product u x v in RÌ. Vector perpendicular to u and v, length ||z||||v|]| sin 6| = area 
of parallelogram, u x v = “determinant” of [i j Kk; uy u2 U3; vy V2 v3]. 

Cyclic shift S. Permutation with s21 = 1,532 = 1,..., finally Sın = 1. Its eigenvalues 
are the nth roots e27/*/" of 1; eigenvectors are columns of the Fourier matrix F. 


Determinant |A| = det(A). Defined by det 7 = 1, sign reversal for row exchange, and 
linearity in each row. Then |A| = 0 when A is singular. Also |AB| = |A||B| and 
|A~!| = 1/JA| and |AT| = |A|. The big formula for det(A) has a sum of n! terms, 
the cofactor formula uses determinants of size n — 1, volume of box = | det(A)|. 


Diagonal matrix D. d;; = 0 ifi # j. Block-diagonal: zero outside square blocks Dj;. 


Diagonalizable matrix A. Must have n independent eigenvectors (in the columns of S; 
automatic with n different eigenvalues). Then STIAS = A = eigenvalue matrix. 


Diagonalization A = S~!AS. A = eigenvalue matrix and S = eigenvector matrix of A. 
A must have n independent eigenvectors to make S invertible. All AX = SA‘ S7}. 


Dimension of vector space dim(V) = number of vectors in any basis for V. 
Distributive Law A(B + C) = AB + AC. Add then multiply, or multiply then add. 


Dot product = Inner product x? y = x,y, +-+++ Xn Vn. Complex dot product is ¥' y. 
Perpendicular vectors have x! y = 0. (AB); j = (row i of A)! (column j of B). 


Echelon matrix U . The first nonzero entry (the pivot) in each row comes in a later column 
than the pivot in the previous row. All zero rows come last. 


Eigenvalue À and eigenvector x. Ax = Ax with x Æ 0 so det(A —A/) = 0. 


Elimination. A sequence of row operations that reduces A to an upper triangular U or 
to the reduced form R = rref(A). Then A = LU with multipliers €;; in L, or 
PA = LU with row exchanges in P, or EA = R with an invertible E. 


Elimination matrix = Elementary matrix £;;. The identity matrix with an extra —£;; 
in thei, j entry (i # j). Then E;j A subtracts £;; times row j of A from row i. 
Ellipse (or ellipsoid) x" Ax = 1. A must be positive definite; the axes of the ellipse are 

eigenvectors of A, with lengths 1/ VA. (For ||x|| = 1 the vectors y = Ax lie on the 
ellipse ||A~! y]]? = y™(AA")—!y = 1 displayed by eigshow; axis lengths 0;.) 
Exponential e4’ = I + At+(At)?/2!+--- has derivative Ae“: e^t u(0) solves w = Au. 
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Factorization A = LU. If elimination takes A to U without row exchanges, then the 
lower triangular L with multipliers £;; (and £;; = 1) brings U back to A. 


Fast Fourier Transform (FFT). A factorization of the Fourier matrix Fa into £ = log, n 
matrices S; times a permutation. Each S; needs only x/2 multiplications, so Fax 
and F,-'e can be computed with n£/2 multiplications. Revolutionary. 


Fibonacci numbers 0, 1, 1,2,3,5,... satisfy Fa = Fn-1 + Fr—2 = (AZ—-AZ)/(A1 —A2). 
Growth rate Ay = (1+ /5)/2 is the largest eigenvalue of the Fibonacci matrix | } 3]. 
Four Fundamental Subspaces C (A), N (A), C (A7), N (AT). Use A` for complex A. 


Fourier matrix F. Entries Fj, = e?7!/*/" give orthogonal columns FF =n]. Then 
y = Fc is the (inverse) Discrete Fourier Transform y; = X. cge? "kin, 


Free columns of A. Columns without pivots; these are combinations of earlier columns. 


Free variable x;. Column i has no pivot in elimination. We can give the n — r free 
variables any values, then Ax = b determines the r pivot variables (if solvable!). 


Full column rank r = n. Independent columns, N (A) = {0}, no free variables. 


Full row rank r = m. Independent rows, at least one solution to Ax = b, column space 
is all of R”, Full rank means full column rank or full row rank. 


Fundamental Theorem. The nullspace N (A) and row space C (AT) are orthogonal com- 
plements in R” (perpendicular from Ax = 0 with dimensions r and n — r). Applied 
to AT, the column space C (A) is the orthogonal complement of N (AT) in R”. 


Gauss-Jordan method. Invert A by row operations on[A J]toreach[Z A™!]. 


Gram-Schmidt orthogonalization A = QR. Independent columns in A, orthonormal 
columns in Q. Each column q; of Q is a combination of the first j columns of A 
(and conversely, so R is upper triangular). Convention: diag(R) > 0. 


Graph G. Set of n nodes connected pairwise by m edges. A complete graph has all 
n(n — 1)/2 edges between nodes. A tree has only n — 1 edges and no closed loops. 


Hankel matrix H. Constant along each antidiagonal; h;; depends oni + j. 


eo . s —T — . . 
Hermitian matrix A" = A = A. Complex analog aj; = aij of a symmetric matrix. 
Hessenberg matrix H. Triangular matrix with one extra nonzero adjacent diagonal. 


Hilbert matrix hilb(n). Entries Hj; = 1/(i + j —1) = f} x/—1x/—!dx. Positive definite 
but extremely small Amin and large condition number: H is ill-conditioned. 


Hypercube matrix P?. Row n + 1 counts corners, edges, faces,... of a cube in R”. 
Identity matrix 7 (or /,,). Diagonal entries = 1, off-diagonal entries = 0. 


Incidence matrix of a directed graph. The m by n edge-node incidence matrix has a 
row for each edge (node i to node j), with entries —1 and 1 in columns i and j. 


Indefinite matrix. A symmetric matrix with eigenvalues of both signs (+ and —). 
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Independent vectors v;,...,vz%. No combination c,v1 + --- + CkUk = zero vector 
unless all c; = 0. If the v’s are the columns of A, the only solution to Ax = 0 is 
x=0. 


Inverse matrix A~!. Square matrix with ATA = I and AAT! = I. No inverse if 
det A = 0 and rank(A) < n and Ax = 0 fora nonzero vector x. The inverses of AB 
and AT are B~! A`! and (A7!)". Cofactor formula (A7!);; = Cj;/ det A. 


Iterative method. A sequence of steps intended to approach the desired solution. 


Jordan form J = M~!AM. If A has s independent eigenvectors, its “generalized” 
eigenvector matrix M gives J = diag(Jj,..., Js). The block Jy is Ag Jy + Ng where 
Nx has 1’s on diagonal 1. Each block has one eigenvalue A, and one eigenvector. 


Kirchhoff’s Laws. Current Law: net current (in minus out) is zero at each node. Voltage 
Law: Potential differences (voltage drops) add to zero around any closed loop. 


Kronecker product (tensor product) A ® B. Blocks a;; B, eigenvalues A p(A)Ag(B). 


Krylov subspace K;(A,b). The subspace spanned by b, Ab,..., 4/~'b. Numerical 
methods approximate A~1b by x ; with residual b — Ax ; in this subspace. A good 
basis for K; requires only multiplication by A at each step. 


Least squares solution F. The vector £ that minimizes the error |e ||? solves ATAF = 
ATb. Then e = b — AX is orthogonal to all columns of A. 


Left inverse At. If A has full column rank n, then A+ = (A7A)7! AT has ATA = In. 
Left nullspace N (AT). Nullspace of AT = “left nullspace” of A because yTA = 0°. 
Length ||x ||. Square root of xx (Pythagoras in n dimensions). 

Linear combination cv + dw or > cjvj. Vector addition and scalar multiplication. 


Linear transformation T. Each vector v in the input space transforms to T (v) in the 
output space, and linearity requires T(cv + dw) = cT(v) + d T(w). Examples: 
Matrix multiplication Av, differentiation and integration in function space. 


Linearly dependent v;,..., Un. A combination other than all c; = 0 gives }° cjv; = 0. 

Lucas numbers L, = 2,1,3,4,... satisfy Ly = Lyi +Ln-2 = AZ +43, with 21, A2 = 
(1 + /5)/2 from the Fibonacci matrix [13]. Compare Lo = 2 with Fy = 0. 

Markov matrix M. All m;; > 0 and each column sum is 1. Largest eigenvalue A = 1. If 
mj; > 0, the columns of M K approach the steady state eigenvector Ms = s > 0. 


Matrix multiplication AB. The i, j entry of AB is (row i of A)-(column j of B) = 
> aikbkj. By columns: Column j of AB = A times column j of B. By rows: row 
i of A multiplies B. Columns times rows: AB = sum of (column k)(row k). All 
these equivalent definitions come from the rule that AB times x equals A times Bx. 


Minimal polynomial of A. The lowest degree polynomial with m(A) = zero matrix. This 
is p(A) = det(A — AJ) if no eigenvalues are repeated; always m(A) divides p(A). 


Multiplication Ax = x,(column 1) +--- + x,(column n) = combination of columns. 
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Multiplicities AM and GM. The algebraic multiplicity AM of A is the number of times 
A appears as a root of det(A — AJ) = 0. The geometric multiplicity GM is the 
number of independent eigenvectors for A (= dimension of the eigenspace). 


Multiplier ¢;;. The pivot row j is multiplied by £;; and subtracted from row i to eliminate 
the i, j entry: £;; = (entry to eliminate) / (jth pivot). 


Network. A directed graph that has constants c1,..., Cm associated with the edges. 


Nilpotent matrix M. Some power of N is the zero matrix, N* = 0. The only eigenvalue 
is A = 0 (repeated n times). Examples: triangular matrices with zero diagonal. 


Norm || All. The “£? norm” of A is the maximum ratio || Ax ||/|x|| = dmax: Then || Ax || < 
Allllx|| and JAB] < JAIB] and |A + BI < |All + |||]. Frobenius norm 
(Ale => > a7... The £! and £% norms are largest column and row sums of |a;;|. 


Normal equation A!’ Ax = ATb. Gives the least squares solution to Ax = b if A has full 
rank n (independent columns). The equation says that (columns of A)-(b — AF) = 0. 


Normal matrix. If NNT = NTN, then N has orthonormal (complex) eigenvectors. 
Nullspace N (A) = All solutions to Ax = 0. Dimension n — r = (# columns) — rank. 
Nullspace matrix N. The columns of N are the n — r special solutions to As = 0. 


Orthogonal matrix Q. Square matrix with orthonormal columns, so QT = gui, 
Preserves length and angles, ||Ox|| = ||x|| and (Qx)'(Oy) = x'y. All [A| = 1, 
with orthogonal eigenvectors. Examples: Rotation, reflection, permutation. 


Orthogonal subspaces. Every v in V is orthogonal to every w in W. 


Orthonormal vectors q1,...,q,,. Dot products are q7q j= Oifi A j andq; "gq, = 1. 
The matrix Q with these orthonormal columns has QTQ = J. If m =n then QT = 
Q~! and q1,...,, is an orthonormal basis for R”: every v = } (v"q ;)q ;- 


Outer product uv? = column times row = rank one matrix. 


Partial pivoting. In each column, choose the largest available pivot to control roundoff; 
all multipliers have |2;;| < 1. See condition number. 

Particular solution x». Any solution to Ax = b; often x p has free variables = 0. 

Pascal matrix Ps = pascal(n) = the symmetric matrix with binomial entries (‘4/7’). 
Ps = P, Py all contain Pascal’s triangle with det = 1 (see Pascal in the index). 


Permutation matrix P. There are n! orders of 1,...,n. Then! P’s have the rows of J in 
those orders. PA puts the rows of A in the same order. P is even or odd (det P = 1 
or —1) based on the number of row exchanges to reach /. 


Pivot columns of A. Columns that contain pivots after row reduction. These are not 
combinations of earlier columns. The pivot columns are a basis for the column space. 


Pivot. The diagonal entry (first nonzero) at the time when a row is used in elimination. 
Plane (or hyperplane) in R”. Vectors x with aTx = 0. Plane is perpendicular to a Æ 0. 
Polar decomposition A = QH. Orthogonal Q times positive (semi)definite H. 
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Positive definite matrix A. Symmetric matrix with positive eigenvalues and positive 
pivots. Definition: x’ Ax > 0 unless x = 0. Then A = LDL" with diag(D)> 0. 


Projection p = a(a™b/a‘a) onto the line through a. P = aa” /a™a has rank 1. 


Projection matrix P onto subspace S. Projection p = Pb is the closest point to b in 
S error e = b — Pb is perpendicular to S. P? = P = PT, eigenvalues are 1 or 0, 
eigenvectors are in S or S+. If columns of A = basis for S then P = A(ATA)—1 47. 


Pseudoinverse At (Moore-Penrose inverse). The n by m matrix that “inverts” A from 
column space back to row space, with N(At) = N(AT). ATA and AA? are the 
projection matrices onto the row space and column space. Rank(At) = rank(A). 


Random matrix rand(n) or randn(n). MATLAB creates a matrix with random entries, 
uniformly distributed on [0 1] for rand and standard normal distribution for randn. 


Rank one matrix A = uv! Æ 0. Column and row spaces = lines cu and cv. 
Rank 7 (A) = number of pivots = dimension of column space = dimension of row space. 


Rayleigh quotient g(x) = xTAx/xTx for symmetric A: Amin < q(x) < Amax. Those 
extremes are reached at the eigenvectors x for Amin (A) and Amax (A). 


Reduced row echelon form R = rref(A). Pivots = 1; zeros above and below pivots; the 
r nonzero rows of R give a basis for the row space of A. 


Reflection matrix (Householder) Q = J —2uu". Unit vector u is reflected to Qu = —u. 
All x in the plane mirror u’x = 0 have Qx = x. Notice QT = Q7! = Q. 


Right inverse A*. If A has full row rank m, then At = A™(AA™)~! has AAt = Im. 


Rotation matrix R = | $ 7S | rotates the plane by 0 and RT! = RT rotates back by —0. 
Eigenvalues are ef? and e7’®, eigenvectors are (1, +i). c,s = cos@, sind. 


Row picture of Ax = b. Each equation gives a plane in R”; the planes intersect at x. 
Row space C (A7) = all combinations of rows of A. Column vectors by convention. 


Saddle point of f(x1,..., Xn). A point where the first derivatives of f are zero and the 
second derivative matrix (3? f/0x;0x; = Hessian matrix) is indefinite. 


Schur complement S = D — CAT! B. Appears in block elimination on [ 4 8 ]. 
Schwarz inequality |v-w| < ||v|| ||w||-Then |v™4w|? < (vw Av)(w" Aw) for pos def A. 
Semidefinite matrix A. (Positive) semidefinite: all x™Ax > 0, all À > 0; A = any RTR. 
Similar matrices A and B. Every B = M~! AM has the same eigenvalues as A. 


Simplex method for linear programming. The minimum cost vector x* is found by 
moving from corner to lower cost corner along the edges of the feasible set (where 
the constraints Ax = b and x > 0 are satisfied). Minimum cost at a corner! 


Singular matrix A. A square matrix that has no inverse: det(A) = 0. 


Singular Value Decomposition (SVD) A = U £ŁVT = (orthogonal)(diag)(orthogonal) 
First r columns of U and V are orthonormal bases of C (A) and C (AT), Av; = Oili 
with singular value g; > 0. Last columns are orthonormal bases of nullspaces. 
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Skew-symmetric matrix K. The transpose is —K, since K;; = —K,;;. Eigenvalues are 
p j j 8 
pure imaginary, eigenvectors are orthogonal, e*? is an orthogonal matrix. 


Solvable system Ax = b. The right side b is in the column space of A. 

Spanning set. Combinations of v1, ..., Um fill the space. The columns of A span C (A)! 
Special solutions to As = 0. One free variable is s; = 1, other free variables = 0. 
Spectral Theorem A = QAQ". Real symmetric A has real 4’s and orthonormal q’s. 
Spectrum of A = the set of eigenvalues {A1,...,A,}. Spectral radius = max of |A;|. 
Standard basis for R”. Columns of n by n identity matrix (written 7, j , k in R3). 


Stiffness matrix If x gives the movements of the nodes, Kx gives the internal forces. 
K = A'CA where C has spring constants from Hooke’s Law and Ax = stretching. 


Subspace $ of V. Any vector space inside V, including V and Z = {zero vector only}. 
Sum V + W of subspaces. Space of all (v in V} + (w in W). Direct sum: VOW = {0}. 
Symmetric factorizations A = LDL" and A = QAQ". Signs in A = signs in D. 
Symmetric matrix A. The transpose is A? = A, and aj; = a ji. AT! is also symmetric. 
Toeplitz matrix. Constant down each diagonal = time-invariant (shift-invariant) filter. 
Trace of A = sum of diagonal entries = sum of eigenvalues of A. Tr AB = Tr BA. 
Transpose matrix AT. Entries Aj, = Aji. A7 is n by m, ATA is square, symmetric, 
positive semidefinite. The transposes of AB and A`! are BTAT and (A‘)7}. 
Triangle inequality ||z + v|| < lji] + lvl]. For matrix norms ||A + Bll < Al + |B]. 
Tridiagonal matrix T: t; = 0 if |i ~ j| > 1. T7! has rank 1 above and below diagonal. 
Unitary matrix U" = U` = U~, Orthonormal columns (complex analog of Q). 


Vandermonde matrix V. Vc = b gives coefficients of p(x) = co + +++ + Cn—1x"! 
with p(x;) = bi. Viz = (x;)/~! and det V = product of (xz — x;) for k > i. 


Vector v in R”. Sequence of n real numbers v = (v1,..., Un) = point in R”. 
Vector addition. v + w = (v1 + W1,...,Uq + Wn) = diagonal of parallelogram. 


Vector space V. Set of vectors such that all combinations cv + dw remain within V. 
Eight required rules are given in Section 3.1 for scalars c,d and vectors v, w. 


Volume of box. The rows (or the columns) of A generate a box with volume | det(A)]. 
Wavelets w; (t). Stretch and shift the time axis to create w(t) = woo (2/t — k). 


MATRIX FACTORIZATIONS 


A=LU= ( lower triangular L ) ( upper triangular U ) 


1’s on the diagonal pivots on the diagonal 


Requirements: No row exchanges as Gaussian elimination reduces A to U. 


A=LDU = ( lower triangular L ) ( pivot matrix ) ( upper triangular U ) 


I’s on the diagonal D is diagonal 1’s on the diagonal 
Requirements: No row exchanges. The pivots in D are divided out to leave 1’s on 
the diagonal of U. If A is symmetric then U is LT and A = LDL". 

PA = LU (permutation matrix P to avoid zeros in the pivot positions). 
Requirements: A is invertible. Then P,L,U are invertible. P does all of the 
row exchanges in advance, to allow normal LU. Alternative: A = L,P,Uj. 

EA = R (m by m invertible E) (any matrix A) = rref(A). 

Requirements: None! The reduced row echelon form R has r pivot rows and pivot 
columns. The only nonzero in a pivot column is the unit pivot. The last m — r rows 
of E are a basis for the left nullspace of A; they multiply A to give zero rows in R. 
The first r columns of E™! are a basis for the column space of A. 

A = CTC = (lower triangular) (upper triangular) with vD on both diagonals 
Requirements: A is symmetric and positive definite (all n pivots in D are positive). 
This Cholesky factorization C = chol(A) has CT = L/D, so CTC = LDL". 

A = QR = (orthonormal columns in Q) (upper triangular R). 

Requirements: A has independent columns. Those are orthogonalized in Q by the 
Gram-Schmidt or Householder process. If A is square then Q7! = QT. 

A = SAST! = (eigenvectors in S) (eigenvalues in A) (left eigenvectors in S71). 


Requirements: A must have z linearly independent eigenvectors. 
A = QAQ™ = (orthogonal matrix Q) (real eigenvalue matrix A) (QT is Q071). 


Requirements: A is real and symmetric. This is the Spectral Theorem. 
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9. 


10. 


11. 


12. 


13. 


14. 


15. 


A = MJM! = (generalized eigenvectors in M) (Jordan blocks in J) (M7!). 


Requirements: A is any square matrix. This Jordan form J has a block for each 
independent eigenvector of A. Every block has only one eigenvalue. 


A=UEV'TE= ( orthogonal ) ( m x n singular value matrix ) ( orthogonal ) 


Uismxn O1... .,Or On its diagonal Visnxn 
Requirements: None. This singular value decomposition (SVD) has the eigenvec- 
tors of AAT in U and eigenvectors of ATA in V; o; = yA; (ATA) = V/A; (AA). 


AteveEtut= orthogonal n x m pseudoinverse of X orthogonal 

nxn 1/01,..., 1/0 on diagonal mxm J 
Requirements: None. The pseudoinverse At has A* A = projection onto row space 
of A and AAT = projection onto column space. The shortest least-squares solution 
to Ax = b is ¥ = Atb. This solves ATAF = AD. 


A = QH = (orthogonal matrix Q) (symmetric positive definite matrix H). 


Requirements: A is invertible. This polar decomposition has H? = ATA. The 
factor H is semidefinite if A is singular. The reverse polar decomposition A = KQ 
has K? = AA’. Both have Q = UV? from the SVD. 


A = UAU ™! = (unitary U) (eigenvalue matrix A) (UT! which is UH = U'). 


Requirements: A is normal: A" A = AA". Its orthonormal (and possibly complex) 
eigenvectors are the columns of U. Complex 4’s unless A = A: Hermitian case. 


A = UTU™—! = (unitary U) (triangular T with A’s on diagonal) (UT! = U¥). 


Requirements: Schur triangularization of any square A. There is a matrix U with 
orthonormal columns that makes U—! AU triangular: Section 6.4. 


{fl Dil Fare éven-odd | _ . 
F, = | I D| | Faja] | pone ion | = one step of the (recursive) FFT. 
Requirements: F, = Fourier matrix with entries w/ k where w” = 1: F,F, =n. 
D has 1,w,..., w”/27 1 on its diagonal. Forn = 2° the Fast Fourier Transform 
will compute F„x with only ing = Èn log, n multiplications from £ stages of D’s. 


cofactor 
cramer 
deter 
eigen2 
eigshow 
eigval 
eigvec 
elim 
findpiv 
fourbase 
grams 
house 
inverse 
leftnull 
linefit 

Isq 
normal 
nulbasis 
orthcomp 
partic 
plot2d 
plu 
poly2str 
project 
projmat 
randperm 
rowbasis 
samespan 
signperm 
slu 

slv 

splu 

splv 
symmeig 
tridiag 


MATLAB TEACHING CODES 


These Teaching Codes are directly available from web.mit.edu/ 18.06 


Compute the n by n matrix of cofactors. 

Solve the system Ax = b by Cramer’s Rule. 

Matrix determinant computed from the pivots in PA = LU. 
Eigenvalues, eigenvectors, and det(A — AJ) for 2 by 2 matrices. 
Graphical demonstration of eigenvalues and singular values. 
Eigenvalues and their multiplicity as roots of det(A — AZ) = 0. 
Compute as many linearly independent eigenvectors as possible. 
Reduction of A to row echelon form R by an invertible £. 

Find a pivot for Gaussian elimination (used by plu). 

Construct bases for all four fundamental subspaces. 
Gram-Schmidt orthogonalization of the columns of A. 

2 by 12 matrix giving corner coordinates of a house. 

Matrix inverse (if it exists) by Gauss-Jordan elimination. 
Compute a basis for the left nullspace. 

Plot the least squares fit to m given points by a line. 

Least squares solution to Ax = b from AT AX = AD. 
Eigenvalues and orthonormal eigenvectors when ATA = AAT. 
Matrix of special solutions to Ax = 0 (basis for nullspace). 
Find a basis for the orthogonal complement of a subspace. 
Particular solution of Ax = b, with all free variables zero. 
Two-dimensional plot for the house figures. 

Rectangular PA = LU factorization with row exchanges. 
Express a polynomial as a string. 

Project a vector b onto the column space of A. 

Construct the projection matrix onto the column space of A. 
Construct a random permutation. 

Compute a basis for the row space from the pivot rows of R. 
Test whether two matrices have the same column space. 
Determinant of the permutation matrix with rows ordered by p. 
LU factorization of a square matrix using no row exchanges. 
Apply slu to solve the system Ax = b allowing no row exchanges. 
Square PA = LU factorization with row exchanges. 

The solution to a square, invertible system Ax = b. 

Compute the eigenvalues and eigenvectors of a symmetric matrix. 
Construct a tridiagonal matrix with constant diagonals a, b, c. 
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Index 


See the entries under Matrix 


A 

Addition of vectors, 2, 3, 33, 121 

All combinations, 5, 122, 123 

Angle between vectors, 14, 15 
Anti-symmetric, 109 (see Skew-symmetric) 
Area, 272, 273, 280 

Arnoldi, 488, 491, 492 

Arrow, 3, 4, 423 

Associative law, 58, 59, 69, 80 

Average, 227, 450, 456 


B 

Back substitution, 45, 49, 98 
Backslash, 99, 156 

Basis, 168, 172, 180, 200, 391 
Big formula, 256, 258 

Big picture, 187, 199, 421 
Binomial, 442, 454 
Bioinformatics, 457 


BLAS: Basic Linear Algebra Subroutines, 
466 


Block elimination, 71 
Block multiplication, 70, 79 
Block pivot, 94 

Boundary condition, 417 
Bowl, 353 

Box, 273, 276 


C 
Calculus, 25, 281, 417 


Cauchy-Binet, 282 

Cayley-Hamilton, 310, 311, 362 

Centered difference, 25, 28, 316, 328 
Change of basis, 358, 390, 391, 396, 400 
Characteristic polynomial, 287 

Cholesky factorization, 102, 345, 353, 564 
Circle, 315, 316 

Clock, 9 

Closest line, 218, 219, 222 

Cofactors, 255, 259, 260, 265, 270 
Column at a time, 23, 32, 36 

Column picture, 32, 34, 40 

Column space C (A), 123, 124, 130 
Column vector, 2, 4 

Columns times rows, 62, 68, 71, 145, 150 
Combination of columns, 32, 33, 56 
Commutative, 59, 69 

Commuting matrices, 305 

Complete solution, 136, 156, 159, 162, 313 
Complex, 120, 340, 493, 494, 499, 506, 509 
Complex eigenvalues, 289, 333 

Complex eigenvectors, 289, 333 
Compression, 364, 373, 391, 410 
Computational science, 189, 317, 419, 427 
Computer graphics, 459, 462, 463 
Condition number, 371, 477, 478 
Conjugate, 333, 338, 494, 501, 506 
Conjugate gradients, 486, 492 

Constant coefficients, 312 

Convolution, 515 

Corner, 8, 441, 443 
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Cosine of angle, 15, 17, 447 
Cosine Law, 20 

Cosine of matrix, 329 

Cost vector, 440 

Covariance, 228, 453-458 
Cramer’s Rule, 259, 269, 279 
Cross product, 275, 276 
Cube, 8, 73, 274, 281, 464 
Cyclic, 25, 93, 374 


D 

Delta function, 449, 452 

Dependent, 26, 27, 169, 170 

Derivative, 24, 109, 229, 384, 395 

Determinant, 63, 244-280, 288, 295 

Diagonalizable, 300, 304, 308, 334, 335 

Diagonalization, 298, 300, 330, 332, 363, 
399 

Differential equation, 312-329, 416 


Dimension, 145, 168, 174, 175, 176, 183, 
185, 187 


Discrete cosines, 336, 373 
Discrete sines, 336, 373 

Distance to subspace, 212 
Distributive law, 69 

Dot product, 11, 56, 108, 447, 502 
Dual problem, 442, 446 


E 

Economics, 435, 439 
Eigencourse, 457, 458 
Eigenvalue, 283, 287, 374, 499 
Eigenvalue changes, 439 
Eigenvalues of A”, 284, 294, 300 
Eigenvalues of uvT, 297 
Eigenvalues of AB, 362 
Eigenvector basis, 399 
Eigenvectors, 283, 287, 374 
Eigshow, 290, 368 
Elimination, 45-66, 83, 86, 135 


Index 


Ellipse, 290, 346, 366, 382 

Energy, 343, 409 

Engineering, 409, 419 

Error, 211, 218, 219, 225, 481, 483 
Error equation, 477 

Euler angles, 474 

Euler’s formula, 311, 426, 430, 497 
Even, 113, 246, 258, 452 
Exponential, 314, 319, 327 


F 

Factorization, 95, 110, 235, 348, 370, 374 
False proof, 305, 338 

Fast Fourier Transform, 393, 493, 511, 565 
Feasible set, 440, 441 

FFT (see Fast Fourier Transform), 509-514 
Fibonacci, 75, 266, 268, 301, 302, 306, 308 
Finite difference, 315-317, 417 

Finite elements, 412, 419 

First-order system, 315, 326 

Fixed-free, 410, 414, 417, 419 

Force balance, 412 

FORTRAN, 16, 38 

Forward difference, 30 


Four Fundamental Subspaces, 184-199, 368, 
424, 507 


Fourier series, 233, 448, 450, 452 
Fourier Transform, 393, 509-514 
Fredholm Alternative, 203 

Free, 133, 135, 137, 144, 146, 155 
Full column rank, 157, 170, 405 
Full row rank, 159, 405 

Function space, 121, 448, 449 
Fundamental Theorem of Linear Algebra, 188, 


198, 368 (see Four Fundamental 
Subspaces) 


G 
Gaussian elimination, 45, 49, 135 
Gaussian probability distribution, 455 


Index 


Gauss-Jordan, 83, 84, 91, 469 
Gauss-Seidel, 481, 484, 485, 489 

Gene expression data, 457 

Geometric series, 436 

Gershgorin circles, 491 

Gibbs phenomenon, 451 

Givens rotation, 471 

Google, 368, 369, 434 

Gram-Schmidt, 223, 234, 236, 241, 370, 469 
Graph, 74, 143, 307, 311, 420, 422, 423 
Group, 119, 354 


H 

Half-plane, 7 

Heat equation, 322, 323 

Heisenberg, 305, 310 

Hilbert space, 447, 449 

Hooke’s Law, 410, 412 

Householder reflections, 237, 469, 472 
Hyperplane, 30, 42 


l 

Ill-conditioned matrix, 371, 473, 474 
Imaginary, 289 

Independent, 26, 27, 134, 168, 200, 300 
Initial value, 313 

Inner product, 11, 56, 108, 448, 502, 506 
Input and output basis, 399 — 

Integral, 24, 385, 386 | 

Interior point method, 445 

Intersection of spaces, 129, 183 

Inverse matrix, 24, 81, 270 

Inverse of AB, 82 

Invertible, 86, 173, 200, 248 

Iteration, 481, 482, 484, 489, 492 


J 

Jacobi, 481, 483, 485, 489 

Jordan form, 356, 357, 358, 361, 482 
JPEG, 364, 373 
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K 

Kalman filter, 93, 214 

Kernel, 377, 380 

Kirchhoff’s Laws, 143, 189, 420, 424-427 
Krylov, 491, 492 


L 

£1 and £% norm, 225, 480 

Lagrange multiplier, 445 

Lanczos method, 490, 492 

LAPACK, 98, 237, 486 

Leapfrog method, 317, 329 

Least squares, 218, 219, 236, 405, 408, 453 
Left nullspace N (AT), 184, 186, 192, 425 
Left-inverse, 81, 86, 154, 405 

Length, 12, 232, 447, 448, 501 

Line, 34, 40, 221, 474 

Line of springs, 411 

Linear combination, 1, 3 

Linear equation, 23 

Linear programming, 440 

Linear transformation, 44, 375-398 
Linearity, 44, 245, 246 

Linearly independent, 26, 134, 168, 169, 200 
LINPACK, 465 

Loop, 307, 425, 426 

Lower triangular, 95 

lu, 98, 100, 474 

Lucas numbers, 306 


M 

Maple, 38, 100 

Mathematica, 38, 100 

MATLAB, 17, 37, 237, 243, 290, 337, 513 
Matrix, 22, 384, 387 (see full page 570) 
Matrix exponential, 314, 319, 327 

Matrix multiplication, 58, 59, 67, 389 
Matrix notation, 37 

Matrix space, 121, 122, 175, 181, 311 


570 Index 


With the single heading “Matrix” this page indexes the active life of linear algebra. 


Matrix, Markov, 43, 285, 294, 369, 373, 431, 437 
—1,2,—1 matrix, 106, 167, 261, 265, 349, Negative definite, 343 

374, 410, 480 Nondiagonalizable, 299, 304, 309 
Adjacency, 74, 80, 311, 369 Normal, 341, 508, 565 
All-ones, 251, 262, 307, 348 Northwest, 119 
Augmented, 60, 34, 155 Nullspace matrix, 136, 147 
Band, 99, 468, 469 Orthogonal, 231, 252 , 289 
Block, 70, 94, 115, 266, 348 Pascal, 66, 72, 88, 101, 348, 359 
Circulant, 507, 515 Permutation, 59, 111, 116, 183, 297 
Coefficient, 33, 36 Pivot matrix, 97, 104 
Cofactor matrix, 270 Population, 435 
Companion, 295, 322 Positive matrix, 413, 431, 434, 436 
Complex matrix, 339, 499 Positive definite, 343, 344, 351, 409, 475 
Consumption, 435, 436 Projection, 206, 208, 210, 233, 285, 388, 
Covariance, 228, 453, 455, 456, 458 462, 463 
Cyclic, 25, 93, 374 Pseudoinverse, 199, 399, 403, 404, 565 
Derivative, 385 Rank-one, 145, 152, 294, 311, 363 
Difference, 22, 87, 412 Reflection, 243, 286, 336, 469, 471 
Echelon, 137, 143 Rotation, 231, 289, 460, 471 
Eigenvalue matrix A, 298 Saddle-point, 115, 343 
Eigenvector matrix S, 298 Second derivative (Hessian), 349, 353 
Elimination, 57, 63, 149 Second difference (1, —2, 1), 322, 373, 417 
Exponential, 314, 319, 327 Semidefinite, 345, 415 
First difference, 22, 373 Shearing, 379 
Fourier, 394, 493, 505, 510, 511 Similar, 355-362, 400 
Hadamard, 238, 280 Sine matrix, 349, 354, 373 
Hermitian, 339, 340, 501, 503, 506, 507 Singular, 27, 416, 574 
Hessenberg, 262, 488, 492 Skew-symmetric, 289, 320, 327, 338, 341 
Hilbert, 92, 254, 348 Sparse, 100, 470, 474, 465 
House, 378, 382 Stable, 318 
Hypercube, 73 Stiffness, 317, 409, 412, 419 
Identity, 37, 42, 57, 390 Stoichiometric, 430 
Incidence, 420, 422, 429 Sudoku, 44 
Indefinite, 343 Sum matrix, 24, 87, 271 
Inverse, 24, 81, 270 Symmetric, 109, 330-341 
Invertible, 27, 83, 86, 112, 408, 574 Translation, 459, 463 
Jacobian, 274 Triangular, 95, 236, 247, 271, 289, 335 
Jordan, 356, 358, 462, 565 Tridiagonal, 85, 100, 265, 413, 468, 491 
Laplacian (Graph Laplacian), 428 Unitary, 504, 505, 506, 510 
Leslie, 435, 439 Vandermonde, 226, 253, 266, 511 


Magic, 43 Wavelet, 242 


Index 


Mean, 228, 453-457 

Minimum, 349 

Multigrid, 485 

Multiplication by columns, 23, 36 
Multiplication by rows, 36 
Multiplication count, 68, 80, 99, 467, 469 
Multiplicity, 304, 358 

Multiplier, 45, 46, 50, 96 


N 

n choose m, 442, 454 
n-dimensional space R”, 1, 120 
netlib, 100 

Network, 420, 427 

Newton’s method, 445 

No solution, 26, 39, 46, 192 
Nondiagonalizable, 304, 309 
Norm, 12, 475, 476, 479, 480, 489 
Normal distribution, 455 
Normal equation, 210, 211, 453 
Normal matrix, 341, 508, 565 
Nullspace N (A), 132, 185 


O 

Odd permutation, 113 

Ohm’s Law, 426 

Orthogonality, 14, 195, 448 
Orthogonal complement, 197, 198, 200 
Orthogonal spaces, 197 

Orthogonal subspaces, 195, 196, 204 
Orthogonal vectors, 14, 195 
Orthonormal, 230, 234, 240, 504 
Orthonormal basis, 367, 368, 449 


Orthonormal eigenvectors, 203, 307, 330, 
332, 339, 341, 503 


P 

Parabola, 224 

Parallelogram, 3, 8, 272, 383 
Partial pivoting, 113, 466, 467 
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Particular solution, 155, 156, 159 

Permutation, 44, 47, 231, 257 

Perpendicular, 12, 14 (see Orthogonality) 

Perpendicular eigenvectors, 203, 339 

Perron-Frobenius Theorem, 434 

Pivcol, 146 

Pivot, 45, 46, 55, 256, 333, 351, 466 

Pivot columns, 133, 135, 138, 144, 146, 
173, 185 


Pivot rows, 185 

Pivot variable, 135, 155 

Pixel, 364, 462 

Plane, 6, 26 

Plane rotation, 471 

Poisson distribution, 454 

Polar coordinates, 274, 281, 495-497 
Polar decomposition, 402, 403 
Positive eigenvalues, 342 

Positive pivots, 343 

Potential, 423 

Power method, 487 

Preconditioner, 481, 486 

Principal axes, 330 

Principal Component Analysis, 457 
Probability, 432, 453, 454 

Product of pivots, 63, 85, 244, 333 
Projection, 206-217, 219, 233 
Projection on line, 207, 208 
Projection on subspace, 209, 210 
Projective space, 460 
Pseudoinverse, 199, 399, 403, 404, 407 
Pythagoras, 14, 20 

PYTHON, 16, 100 


Q 
OR factorization, 243, 564 
OR method, 360, 487, 490 


R 
Random, 21, 55, 348, 373, 562 
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Range, 376, 377, 380 

Rank, 144, 159, 160, 166 

Rank of AB, 153, 194, 217 

Rank one, 145, 150, 152, 189 
Rayleigh quotient, 476 

Real eigenvalues, 330, 331 
Recursion, 213, 228, 260, 392, 513 
Reduced cost, 443, 444 


Reduced echelon form (rref), 85, 134, 138, 
148, 166, 564 


Reflection, 232, 243, 286, 336, 471 
Regression, 453 

Repeated eigenvalues, 299, 320, 322 
Residual, 222, 481, 492 

Reverse order, 82, 107 

Right angle, 14 (see Orthogonality) 
Right hand rule, 276 

Right-inverse, 81, 86, 154, 405 
Rotation, 231, 289, 460, 471, 474 
Roundoff error, 371, 466, 477, 478 
Row exchange, 47, 59, 113, 245, 253 
Row picture, 31, 34, 40 

Row reduced echelon form, 85 

Row space C (AT), 171, 184 


S 

Saddle, 353 

Scalar, 2, 32 

Schur complement, 72, 94, 348 
Schur’s Theorem, 335, 341 
Schwarz inequality, 16, 20, 447 
Search engine, 373 

Second difference, 316, 322, 336 
Second order equation, 314-317 
Shake a Stick, 474 

Shift, 375 

Sigma notation, 56 

Simplex method, 440, 443 
Singular value, 363, 365, 371, 476 


Index 


Singular Value Decomposition, see SVD 
Singular vector, 363, 408 
Skew-symmetric, 289, 320, 327, 338, 341 
Solvable, 124, 157, 163 

Span, 125, 131, 168, 171 

Special solution, 132, 136, 146, 147 
Spectral radius, 479, 480, 482 

Spectral Theorem, 330, 335, 564 
Spiral, 316 

Square root, 402 

Square wave, 449, 451 

Stability, 316-318, 329 

Standard basis, 172, 388 

Statistics, 228, 453 

Steady state, 325, 431, 433, 434 
Stretching, 366, 411, 415 

Submatrix, 106, 153 

Subspace, 121, 122, 127, 184-194 

Sum of spaces, 131, 183 

Sum of squares, 344, 347, 350 
Supercomputer, 465 

SVD, 363, 368, 370, 383, 399, 401, 457 


T 
Teaching Code, 99, 149, 566 

Three steps, 302, 303, 313, 319, 329 
Toeplitz, 106, 474 

Trace, 288, 289, 295, 309, 318 
Transformation, 375 

Transpose, 107, 249, 502 

Transpose of AB and A™!, 107 
Tree, 307, 423 

Triangle, 10, 271 

Triangle inequality, 16, 18, 20, 480 
Tridiagonal (see Matrix) 

Triple product, 276, 282 

U 

Uncertainty, 305, 310 


Index 


Unique solution, 157 
Unit vector, 12, 13, 230, 234, 307 
Upper triangular, 45, 236 


V 

Variance, 228, 453, 454 
Vector, 2, 3, 121, 447 

Vector addition, 2, 3, 33, 121 
Vector space, 120, 121, 127 
Voltage, 423 

Volume, 245, 274, 281 


W 

Wave equation, 322, 323 

Wavelet, 391 

Weighted least squares, 453, 456, 458 
Woodbury-Morrison, 93 

Words, 75, 80 


Index of Symbols 

Ax = b , 23, 33 

Ax = Ax , 287 

(A —AI)x = 0 , 288 
(AB)—! = B-'A7!, 82 
(AB)? = BTAT , 107 
(Ax)Ty = x1(ATy), 108, 118 
A= LU ,95, 97, 106, 564 
A = uv" , 145, 152 

A= LPU , 112,564 

A= LDL" , 110, 353, 564 
A = LDU ,97, 105, 564 
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A = MJM—!, 358, 565 

A= OH , 402, 565 

A = OR, 235, 243, 564 

A = QAQT™ , 330, 332, 335, 347, 564 
A = OTO™! , 335, 565 

A = SAS! , 298, 302, 311, 564 
A = UXVT , 363, 365, 565 
ATA, 110, 211 , 216, 365, 429 
ATAF = ATh , 210, 218, 404 
ATCA , 412, 413 

AF = SAKS—! | 299, 302 
AB = BA , 305 

C(A) , 125 

C(A?) , 171, 184 

det(A — AI) = 0, 287 

e^! , 314, 319, 320, 327 

eft = Sedts—! 319 

EA = R, 149, 187, 564 
N(A), 132 

N(A") , 184 

P = A(ATA) AT ‚211 

PA = LU , 112, 564 

OTỌ = I ,230 

R” , 120 

C” , 120, 491 

rref , 138, 154, 564 

u = ex ,312 

V+ ,197 

w = e2t/n | 497, 509 
xt=ATb , 404, 408 


Linear Algebra Websites 


math.mit.edu/linearalgebra Dedicated to help readers and teachers working with this book 


ocw.mit.edu. MIT’s OpenCourseWare site including video lectures in 18.06 and 18.085-6 


web.mit.edu/18.06 Current and past exams and homeworks with extra materials 


wellesleycambridge.com Ordering information for books by Gilbert Strang 


LINEAR ALGEBRA IN A NUTSHELL 
((The matrix A is n byn)) 


Nonsingular 


A is invertible 

The columns are independent 

The rows are independent 

The determinant is not zero 

Ax =0 has one solution x =0 

Ax =b has one solution x = A~'b 
A has n (nonzero) pivots 

A has full rank r =n 

The reduced row echelon form is R= J 
The column space is all of R” 

The row space is all of R” 

All eigenvalues are nonzero 

ATA is symmetric positive definite 
A has n (positive) singular values 


Singular 


A is not invertible 

The columns are dependent 

The rows are dependent 

The determinant is zero 

Ax =Q has infinitely many solutions 
Ax =b has no solution or infinitely many 
A hasr < n pivots 

A has rankr < n 

R has at least one zero row 

The column space has dimension r < n 
The row space has dimension r < n 
Zero is an eigenvalue of A 

ATA is only semidefinite 

A hasr < n singular values 


—— | | 
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Chapters 5-6 i igenvalue: 


Av =c mbyn Chapters 6-7 Singular values 


The diagram on the front cover shows the four fundamental subspaces 
for the matrix A Those subspaces ead to the Fundamental Theorem 


of Linear Algebra 


. The dimensions of the four subspaces 


The best bases for all four subspaces 


1 
2. The orthogonality of the two pairs 
3. 


This is the textbook that accompanies the author's vide 


and the review material on MIT's OpenCourseWare 
ocw.mit.edu and web.mit.edu/18.06 


Many universities and colleges (and now high schools) use this 
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